e53bd7cbd1
110 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e53bd7cbd1
|
feat(ui): generate dashboard API types from the proxy OpenAPI spec (#29816)
* feat(ui): generate dashboard API types from the proxy OpenAPI spec Introduces the shared type foundation for the dashboard without touching any runtime code. The proxy's FastAPI app is the source of truth; app.openapi() emits the spec and openapi-typescript turns it into src/lib/http/schema.d.ts. Adds an npm run gen:api script (a Python spec dump piped into openapi-typescript) and a Check UI API Types Sync CI job that regenerates the file from the live spec and fails if it drifts, so the committed types can never silently fall out of step with the backend. The generated file is pinned to openapi-typescript 7.13.0 and excluded from prettier, eslint, and knip, and marked linguist-generated so it collapses in diffs. No openapi-fetch and no call-site changes yet; this only makes the types exist. * chore(ui): tidy gen-api-types script per review Write the spec dump inside a with-block and clean up the temp dir in a finally, so repeated local runs don't leave stray ~MB JSON files behind. |
||
|
|
c7f1bcfd0d
|
build(ui): migrate eslint to flat config and bump eslint-config-next to 16 (#29626)
ESLint 9 defaults to flat config and eslint-config-next was pinned at 15 while Next is on 16, so eslint only ran with ESLINT_USE_FLAT_CONFIG=false and next lint is gone on Next 16. Replace .eslintrc.json with a native flat eslint.config.mjs (config-next 16 ships flat configs, so no FlatCompat shim is needed), bump eslint-config-next to 16.2.6, add @eslint/js and typescript-eslint as explicit devDeps for the recommended rule sets, and point the lint script at eslint directly. This only makes eslint runnable on modern tooling; it does not wire it into CI. The same rules carry over (next/core-web-vitals, eslint and typescript-eslint recommended, prettier, unused-imports) |
||
|
|
0715ed3359
|
build(deps): bump next from 16.2.4 to 16.2.6 in /ui/litellm-dashboard (#27665) (#28524)
Bumps [next](https://github.com/vercel/next.js) from 16.2.4 to 16.2.6. - [Release notes](https://github.com/vercel/next.js/releases) - [Changelog](https://github.com/vercel/next.js/blob/canary/release.js) - [Commits](https://github.com/vercel/next.js/compare/v16.2.4...v16.2.6) --- updated-dependencies: - dependency-name: next dependency-version: 16.2.6 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
faba1fda8a
|
Merge remote-tracking branch 'origin/main' into litellm_merge_main_staging
Restore lineage between main and internal staging so the next staging->main promotion (#27436) can merge without conflicts. main was 2 commits ahead: - |
||
|
|
63bda3f001
|
Merge remote-tracking branch 'upstream/litellm_internal_staging' into cve-sweep-2026-05
# Conflicts: # uv.lock |
||
|
|
454ce5073f
|
fix(anthropic, mcp): sanitize tool names to match Anthropic's [a-zA-Z0-9_-]{1,128} pattern (#26788)
* fix(anthropic, mcp): sanitize tool names to match Anthropic's `^[a-zA-Z0-9_-]{1,128}$`
Tool names with characters like `/` or `.` (commonly produced by the
OpenAPI -> MCP generator from `operationId`s such as
`actions/download-job-logs-for-workflow-run`) caused Anthropic to reject
requests with `tools.N.custom.name: String should match pattern
'^[a-zA-Z0-9_-]{1,128}$'`.
Two layers of fix:
1. Anthropic transformation: build a per-request forward map (original ->
sanitized, disambiguated by suffix on collisions) and a reverse map
(only for names actually rewritten). Forward map is applied to tool
defs, `tool_choice`, and historical assistant tool_calls in messages.
Reverse map is threaded through both the non-streaming and streaming
response paths so callers continue to see their original tool names
in `tool_use` blocks.
2. OpenAPI -> MCP generator: sanitize `operationId` (and the
method+path fallback) at registration time so generated MCP tools are
valid for any strict-name provider, not just Anthropic. The dashboard
preview endpoint applies the same sanitization for parity.
Includes unit tests covering: collision disambiguation between
`foo_bar` and `foo/bar` in the same request, reverse-map only firing
for actually-rewritten names, message rewrite for historical tool_calls,
streaming chunk_parser reverse-mapping, and sanitization of OpenAPI
operationIds plus the preview endpoint output.
Made-with: Cursor
* fix(anthropic): build tool-name maps in transform_request, not optional_params
The previous patch stashed the per-request forward and reverse tool-name
maps under ``optional_params["_anthropic_tool_name_forward_map"]`` and
``optional_params["_anthropic_tool_name_map"]``. ``optional_params`` is
the dict that becomes the JSON body via ``data = {**optional_params}``,
so those internal keys leaked over the wire and Anthropic 400'd with:
_anthropic_tool_name_forward_map: Extra inputs are not permitted
Worse, this meant *every* request whose tool list contained any name with
an invalid character (the exact case the patch was meant to fix) regressed
into a confusing meta-error pointing at LiteLLM's internal map instead of
the offending tool.
Fix: move all tool-name sanitization into ``transform_request``, which is
the single chokepoint already shared by ``AnthropicConfig``,
``AmazonAnthropicConfig`` (Bedrock invoke), ``VertexAIAnthropicConfig``,
and ``AzureAnthropicConfig`` (all call ``super().transform_request`` /
``AnthropicConfig.transform_request(self, ...)``). New static helper
``_sanitize_tool_names_in_request`` walks the already-Anthropic-shaped
``optional_params["tools"]`` (only ``type=="custom"`` entries -- hosted
tool names are reserved by Anthropic and must not be touched), builds
the per-request forward/reverse maps, and applies the forward map in
place to ``tools[*].name`` and ``tool_choice.name``. The reverse map is
stashed exclusively on ``litellm_params`` (which is never serialized to
a provider) under ``_anthropic_tool_name_map`` for the response paths
to consume.
Side effect of this restructure: ``map_openai_params`` is now a pure
OpenAI->Anthropic param translator with no side-channel state, which
matches its contract everywhere else in the codebase.
Tests: replaced the now-incorrect "stashes maps in optional_params"
tests with regressions that assert no underscore-prefixed keys appear
in either ``optional_params`` after ``map_openai_params`` or in the
final ``transform_request`` body. Added end-to-end coverage for:
sanitization in ``transform_request``, ``tool_choice`` rewriting,
historical ``tool_calls`` rewriting in messages, and hosted-tool
passthrough.
Made-with: Cursor
* fix(anthropic): always sanitize empty text content blocks
Anthropic 400s on `{"role": "user", "content": ""}` with:
"messages: text content blocks must be non-empty"
LiteLLM already had `_sanitize_empty_text_content` to rewrite empty text
to a placeholder, but it was gated behind `litellm.modify_params=True`.
With that flag off (default), empty content from upstream agent
frameworks (e.g. pydantic-ai) flowed straight through and tripped the
Anthropic validator.
Fix:
- Always run `_sanitize_empty_text_content` at the top of
`anthropic_messages_pt`, independent of `modify_params`. There is no
way to "pass through" an empty text block, so this is non-optional.
The richer tool-call sanitizations (Cases A/B/D, which actually
mutate conversation structure) remain gated on `modify_params`.
- Extend `_sanitize_empty_text_content` to also handle list-of-blocks
content (`[{"type": "text", "text": ""}]`), not just string content.
Adds 3 regression tests covering string content, list-of-blocks
content, and the no-op case (non-empty messages with modify_params off).
Made-with: Cursor
* fix(anthropic): drop dead tool-name forward-map params, fix mypy + caller-mutation
- remove unused `name_forward_map` param from `_map_tool_choice`,
`_map_tool_helper`, `_map_tools` and the `_apply_anthropic_tool_name_forward`
helper. Production sanitization runs in `_sanitize_tool_names_in_request`
at `transform_request`; these params were never threaded through.
- handler.py: use `ANTHROPIC_TOOL_NAME_REVERSE_MAP_KEY` constant instead of
the hardcoded `"_anthropic_tool_name_map"` string.
- fix mypy `"object" has no attribute "__iter__"` in
`_rewrite_tool_names_in_messages` by guarding `tool_calls` with
`isinstance(..., list)`.
- `_sanitize_tool_names_in_request`: build a new tools list with copy-on-
change entries (and copy `tool_choice` on rewrite) so a caller reusing
the same tool list/dicts across requests doesn't see its inputs
permanently rewritten.
- doc-comment `_build_request_tool_name_maps` clarifying it operates on
OpenAI-format tools (vs `_sanitize_tool_names_in_request` which runs
on Anthropic-format tools post-`_map_tools`).
- tests: drop 3 tests pinning the now-removed param paths; add coverage
for tool_calls + None function_call rewrite and caller-dict immutability.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(mcp): inherit stored credentials in test/tools/list for edit flow
When editing an existing MCP server, the Tool Configuration preview
calls POST /mcp-rest/test/tools/list with server_id but no credentials
(management API redacts them). The endpoint now calls
_inherit_credentials_from_existing_server() so stored bearer tokens
and OAuth2 M2M credentials are loaded from global_mcp_server_manager
automatically — tools load without re-entering credentials.
New servers (no server_id) and requests with explicit credentials are
unaffected (function is a no-op in both cases).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(mcp): show all tools in edit panel, not just allowed tools
Edit flow was passing externalTools (from GET /tools/list, filtered by
allowed_tools) to MCPToolConfiguration, disabling the internal hook.
Remove the external props so the internal hook fires via
POST /test/tools/list, which returns all tools unfiltered. Combined
with the credential inheritance fix, tools load automatically without
re-entering credentials and all tools are visible for re-configuration.
existingAllowedTools still pre-checks previously allowed tools.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix order-dependent collision in _build_anthropic_tool_name_maps
Use a two-pass approach: first pre-register all already-valid tool names
in the 'used' set, then sanitize/disambiguate names that need rewriting.
This ensures valid names always have priority regardless of input order,
preventing duplicate tool names on the wire when e.g. 'foo/bar' appears
before 'foo_bar' in the tool list.
Add regression test for the reversed ordering case.
* Fix OpenAPI tool name collision: disambiguate sanitized names with numeric suffixes
sanitize_openapi_tool_name replaces all invalid chars with '_', but when
two operationIds differ only by sanitized characters (e.g. 'foo/list' and
'foo.list' both become 'foo_list'), the second registration silently
overwrites the first in the tool registry.
Add collision disambiguation in register_tools_from_openapi that appends
_2, _3, ... suffixes when a sanitized name is already taken, mirroring
the existing logic in _build_anthropic_tool_name_maps.
* Fix preview endpoint missing collision disambiguation for tool names
Add used_names tracking and _2/_3 suffix disambiguation to
_preview_openapi_tools, matching the logic in register_tools_from_openapi.
Without this, two operationIds that sanitize to the same string (e.g.
'foo/list' and 'foo.list' both becoming 'foo_list') would show duplicate
names in the preview while registration would disambiguate them.
* Align preview HTTP method order with register_tools_from_openapi
The preview endpoint and register_tools_from_openapi both use
order-dependent collision disambiguation (_2, _3 suffixes). When the
iteration order differs, two operations on the same path with sanitized
names that collide get different suffixes in preview vs registration,
so the dashboard shows names that don't match what actually got
registered.
Also adds a regression test that fails on the swapped order.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* Skip duplicate originals in _build_anthropic_tool_name_maps
If the same invalid tool name appeared twice in original_names (e.g.
['foo/bar', 'foo/bar']), the second occurrence overwrote the forward
map entry with a freshly-suffixed name (foo_bar_2), leaving foo_bar
orphaned in 'used' with no reverse mapping. _sanitize_tool_names_in_request
then rewrote both tool entries to foo_bar_2, and Anthropic 400'd on
duplicate tool names.
Skip the rewrite if forward already has the original mapped.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
|
||
|
|
6ff668c7aa
|
[Infra] Promote internal staging to main (#27245)
* default requested_model to empty string on litellm-side rejects * Update litellm/router.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: scope key access_group_ids override by team's assigned groups A team member could set any access_group_ids on their key (e.g. a group assigned only to a different team) and override the team's model restriction. Intersect the key's access_group_ids with team_object.access_group_ids in _key_access_group_grants_model so foreign groups are dropped before model expansion. Adds a regression test that asserts expansion is never called for foreign groups. * [Fix] Proxy: Skip Personal Budget Hook When Reservation Covers Counter The reservation path (PR #26845) atomically pre-fills `spend:user:{user_id}` and admits at the strict-`<` boundary. The legacy `_PROXY_MaxBudgetLimiter` pre-call hook re-reads the same counter with `>=`, so a reservation that fills the counter to exactly `max_budget` (e.g. a request without a `max_tokens` cap that falls back to reserving the smallest remaining headroom) is rejected by the hook even though the reservation already admitted it. Skip the hook when the request's active `budget_reservation` covers `spend:user:{user_id}`. The reservation is the source of truth for that counter cross-pod; the legacy `>=` path remains in place for requests without a reservation (e.g. paths that bypass the reservation entirely). Reproduces as `tests/otel_tests/test_prometheus.py::test_user_budget_metrics` on a fresh user with `max_budget=10` calling `fake-openai-endpoint` without `max_tokens`. Adds focused unit coverage in `tests/test_litellm/proxy/hooks/test_max_budget_limiter.py`. * harden bedrock file bucket validation * Fix syntax errors from botched merge in router.py * Fix Vertex batch output edge cases * [Fix] RBAC: Drop management_routes Write Fallback for Admin Viewer Greptile P1: the unsafe-method branch of `_check_proxy_admin_viewer_access` ended with a blanket `if route in management_routes: return`. That set is a mix of reads (info/list — handled via the safe-method GET branch above) and writes. The fallback let Admin Viewer POST to write endpoints not enumerated in `_ADMIN_VIEWER_BLOCKED_WRITE_ROUTES`, including: - /team/block, /team/unblock, /team/permissions_update - /jwt/key/mapping/{new,update,delete} - /key/bulk_update - /key/{key_id}/reset_spend Remove the fallback. The two remaining allow sets (admin_viewer_routes and global_spend_tracking_routes) are both read-only, so removal does not affect the legitimate POST-as-read cases (e.g. /spend/calculate, which is in spend_tracking_routes ⊂ admin_viewer_routes). Tests: - 8 new parametrized cases pinning each previously-leaking management write endpoint to 403 on POST for PROXY_ADMIN_VIEW_ONLY. * fix(tests): anchor VCR redis cassette key to repo root `os.path.relpath` with no `start` arg uses the current working directory, so running pytest from a subdirectory produced a different Redis key than running from the repo root. CI-recorded cassettes and locally-replayed runs would silently miss each other's cache. Anchor the path to the repo root (derived from `__file__`) so the key is stable regardless of CWD. https://claude.ai/code/session_018uCx7pcrkdUJZrCVMaTdPx * fix: gate key access_group override on group's own assignment Replaces the previous intersect-with-team.access_group_ids check, which made the override unreachable in practice (the team-gate fallback already covered every case the intersection allowed). The override now resolves each of the key's access_group_ids via get_access_object and accepts the group only if its assigned_team_ids includes the key's team_id, or its assigned_key_ids includes the key's token. This fulfills the original ask (a key can extend a team's allow-list via a group the admin granted to that team or that specific key) while still rejecting foreign groups referenced by team members of other teams. * [Fix] Proxy/Key Management: Honor team_member_permissions /key/list In /key/list Endpoint When a team grants /key/list via team_member_permissions, non-admin members should see all keys for that team — same as a team admin. Previously the classification in list_keys() only checked admin status, so permitted members fell into the service-account-only path and could not see other members' personal keys. Routes those members into the full-visibility set. * Fix access-group bypass via litellm-model fallback path When _get_all_deployments returns 0 candidates and the litellm-model fallback branch (_get_deployment_by_litellm_model) finds deployments that the access-group filter then empties, _access_group_filter_emptied_candidates remained False (it was captured before that branch ran). The router would then proceed to default fallbacks; the fallback model could have no access_groups and short-circuit the filter, silently serving a caller blocked by access-group restrictions. Update the flag inside the litellm-model branch when filtering empties a non-empty candidate set so the default-fallback guard still triggers. * fix(proxy): redact MCP server URL and headers for non-admin viewers (VERIA-8) Many MCP integrations (Zapier, etc.) embed an upstream API key directly in the server URL, e.g. ``https://actions.zapier.com/mcp/<api-key>/sse``. The list and single-server endpoints were returning the full URL to any authenticated user — `_redact_mcp_credentials` only stripped the explicit ``credentials`` field, and `_sanitize_mcp_server_for_virtual_key` only ran for restricted virtual keys. Non-admin internal users could read the dashboard, click the unmask toggle, and exfiltrate the raw token. Add `_sanitize_mcp_server_for_non_admin` that runs on top of the existing credential redaction and clears the credential-bearing fields: - ``url`` (the primary leak vector) - ``spec_path`` (OpenAPI spec URLs that may carry tokens) - ``static_headers`` / ``extra_headers`` (Authorization) - ``env`` (arbitrary secrets) - ``authorization_url`` / ``token_url`` / ``registration_url`` Identity fields (``server_id``, ``alias``, ``mcp_info``, etc.) are preserved so the UI can still list servers a non-admin's team has access to. Apply the new sanitizer in `fetch_all_mcp_servers` and the per-server fetch path right after the existing virtual-key branch. Update the existing `test_list_mcp_servers_non_admin_user_filtered` assertions that previously checked URL visibility. Frontend defense-in-depth: hide the URL unmask toggle on `mcp_server_view.tsx` unless the viewer is a proxy admin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix runtime policy attachment initialization Mark runtime-created policies and attachments initialized so global policy attachments created from the policy builder apply immediately without requiring a restart. Co-authored-by: Cursor <cursoragent@cursor.com> * test(router): cover _try_early_resolve_deployments_for_model_not_in_names The router_code_coverage CI check requires every function in router.py to be referenced by at least one test under tests/{local_testing, router_unit_tests,test_litellm} in a file with "router" in its name. The recently-extracted helper had no direct test, so the check failed with "0.45% of functions in router.py are not tested". Add a focused test that exercises the four return paths: model already in self.model_names, no fallback applies, pattern-router match, and default_deployment substitution (also asserting the stored default isn't mutated). https://claude.ai/code/session_019AVp1XL7RT9RxRe4qRLkay * Fix policy registry teardown in tests Reset the policy ID index during policy engine test cleanup so stale policy versions cannot leak between tests. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(batches): count non-chat tokens, validate batch-file model access (VERIA-39) (#27015) * fix(batches): count non-chat tokens and validate every model in batch file Two security control bypasses on POST /v1/batches: 1. `_get_batch_job_input_file_usage` only summed tokens for `body.messages` (chat completions). Embedding (`input`) and text completion (`prompt`) batches reported zero, letting massive non-chat workloads slip past TPM rate limits. Extend the counter to handle string and list shapes for both fields. 2. The batch input file was forwarded to the upstream provider without inspecting the models named inside the JSONL — only the outer `model` query parameter was checked against the caller's allowlist. A caller restricted to gpt-3.5 could submit a batch targeting gpt-4o and the upstream would execute it under the proxy's shared API key. Add `_get_models_from_batch_input_file_content` (returns the distinct `body.model` values) and call it from `_enforce_batch_file_model_access` in the pre-call hook, which runs each model through `can_key_call_model` so the same allowlist semantics (wildcards, access groups, all-proxy-models, team aliases) the proxy enforces on `/chat/completions` apply here too. Any unauthorized model raises a 403 before the file is forwarded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(batches): count pre-tokenized prompt/input shapes, classify 403 logs Two follow-ups from the Greptile review on the batch validation PR: 1. P1 TPM bypass via integer token arrays. The OpenAI batch schema accepts ``prompt`` and ``input`` as ``list[int]`` (a single pre-tokenized prompt) or ``list[list[int]]`` (multiple) in addition to the string and ``list[str]`` shapes. Pre-fix only the string shapes were counted, so a caller could submit a batch with hundreds of millions of pre-tokenized tokens and the rate limiter would record zero. Extract the per-field logic into ``_count_prompt_or_input_tokens`` and count each int as one token. 2. P2 access-denial logs were indistinguishable from I/O failures. ``count_input_file_usage`` caught every exception under a generic "Error counting input file usage" message, so an intentional 403 from ``_enforce_batch_file_model_access`` looked the same in the logs as a missing file or a Prisma timeout. Catch ``HTTPException`` separately and log 403s at WARNING level with a security-relevant message before re-raising. Tests cover the new shapes: single ``list[int]``, ``list[list[int]]`` (the worst-case bypass vector), and embeddings ``input`` with pre-tokenized arrays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): re-validate user_id after /user/info re-parses query (#27009) * fix(proxy): re-validate user_id ownership after /user/info re-parses query The route-level access check in `RouteChecks.non_proxy_admin_allowed_routes_check` reads `request.query_params.get("user_id")`, which decodes literal `+` to spaces. The endpoint then re-parses the raw query string with `urllib.unquote` in `get_user_id_from_request` to preserve `+` characters (so plus-addressed emails work as user_ids). Those two paths produce different ids: a caller who registered a user_id containing a literal space could pass the route check and then read another user's row by sending the encoded `+` form. Add `_enforce_user_info_access` and call it after `_normalize_user_info_user_id` returns the final id. Proxy admin / view-only admin still bypass; everyone else must match the resolved user_id (or have no user_id, which falls back to the caller's own id later in the handler). Tests cover the admin bypass, owner-match path, and the cross-user lookup that this change blocks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): apply user_info ownership check to PROXY_ADMIN_VIEW_ONLY `_enforce_user_info_access` was bypassing both PROXY_ADMIN and PROXY_ADMIN_VIEW_ONLY, but the upstream route check in `RouteChecks.non_proxy_admin_allowed_routes_check` only treats PROXY_ADMIN as a true admin for the `/user/info` route — view-only admins go through the `user_id == valid_token.user_id` enforcement along with regular users. Mirroring that asymmetry left the same encoded-`+` bypass open for view-only admins whose user_id contains a literal space. Drop the PROXY_ADMIN_VIEW_ONLY exemption so the post-decode re-check matches the upstream rule. Update tests: a view-only admin must now be blocked from cross-user lookups but still allowed to read their own row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(spend-logs): opt-in suppression of stack traces in spend-tracking error logs Adds LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS env var. When set to true and the proxy log level is INFO or above, spend-tracking error paths emit a single ERROR line without the full traceback. Stack traces are preserved at DEBUG and the Sentry / proxy_logging_obj.failure_handler path is unchanged. The new spend_log_error helper is wired through the spend write hot path: - DBSpendUpdateWriter (update_database, _update_*_db, batch upsert, redis-commit fallbacks) - _ProxyDBLogger._PROXY_track_cost_callback - get_logging_payload exception path - update_spend / update_daily_tag_spend / spend logs queue monitor Resolves LIT-2704. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(spend-logs): preserve no-traceback behavior for update_daily_tag_spend This call site previously logged a single-line error via verbose_proxy_logger.error() with no traceback. Switching it to spend_log_error(..., exc=e) caused a full stack trace to render by default (when LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS is unset), which contradicts the PR goal of leaving default behavior unchanged. Revert this specific site to the original error log call. * fix(spend-logs): preserve no-traceback behavior for update_daily_tag_spend Bugbot caught a regression: the previous error log here was a single-line verbose_proxy_logger.error(...) with no traceback. spend_log_error attaches the active exception's traceback by default (when the suppression env var is unset), so swapping it in changed default behavior. Revert this one site to its original .error() call to keep the PR strictly opt-in. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * feat(spend-logs): suppress traceback in SpendLogs error_information row Extend LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS to the failure callback so the per-row Metadata pane in the UI no longer shows the stack trace when the opt-in env var is set, matching the existing console-side suppression. https://claude.ai/code/session_014dztoRbRnRvq54HL9EyHx6 * [Fix] Proxy: Repair Merge Fallout In Router-Override Fallback Auth Conflict resolution for #26968 dropped the `Iterator` typing import (NameError at module load), left a dead `fallback_models = cast(...)` block, and the new tests called `_enforce_key_and_fallback_model_access` without the now-required `request` kwarg. * isolate dual OTEL handlers * harden cloud file compatibility path * harden cloud file compatibility path * [Fix] Proxy/Key Management: Align Key-Org Membership Checks On Generate And Regenerate Mirrors the membership rule on /key/update so that /key/generate and /key/{key}/regenerate apply the same `_validate_caller_can_assign_key_org` gate when the caller specifies an `organization_id`. Proxy admins bypass. The check no-ops when `organization_id` is not being set. * thread trusted params through vertex file content * trust only server legacy file flag * chore(proxy): keep public AI hub unauthenticated * fix(proxy): preserve low-detail readiness status * [Test] Anthropic: Replace Legacy Claude-4-Sonnet Alias With Haiku 4.5 Three live-API tests pinned to claude-4-sonnet-20250514, which is a non-canonical alias of claude-sonnet-4-20250514. Anthropic's main API no longer resolves the legacy form under freshly issued keys, so the tests fail with not_found_error. The token counter test pinned to claude-sonnet-4-20250514 itself (deprecation_date 2026-05-14, two weeks out) was on borrowed time too. Bump all four to claude-haiku-4-5-20251001 — capability superset for what these tests exercise (streaming, parallel tool calling, extended thinking, token counting), no upcoming deprecation, cheaper per-token. * chore(proxy): move URL-valued model/file_id guard from SDK to proxy The previous per-provider guards in HuggingFace, Oobabooga, and Gemini files lived in the SDK layer, breaking SDK callers who legitimately pass URL-valued model identifiers. Move the check to the proxy boundary in add_litellm_data_to_request so SDK users keep working while proxy users default-deny URL-valued model and file_id, with admin opt-in via litellm.provider_url_destination_allowed_hosts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [Chore] Proxy/UI: Drop stray _experimental/out/chat/index.html This file is a regenerable UI build artifact that should not be tracked in source. Removing so the merge into litellm_internal_staging stays clean. * [Test] Anthropic Passthrough: Bump Streaming Cost-Injection Test To Haiku 4.5 test_anthropic_messages_streaming_cost_injection hits the proxy's /v1/messages route, which routes via the anthropic/* wildcard to api.anthropic.com. The 404 surfaced in the test was Anthropic's own not_found_error propagated back through the proxy (visible from the x-litellm-model-id hash on the response — the proxy did route). Same root cause as the prior commit: the legacy claude-4-sonnet-20250514 alias is no longer recognized by Anthropic's main API under the new key. Swap to claude-haiku-4-5-20251001 — same routing path, canonical model. * fix(proxy): handle ownership-recording failures after upstream create If record_container_owner raises after the upstream container is created, the user previously got a 500 with no usable container — they were billed for an unreachable resource. Move ownership recording into the create path's exception handling and split the two failure modes: - HTTPException from the recorder (auth conflicts) propagates verbatim so the client sees the real status code, not a generic LLM error. - Unexpected exceptions are logged and swallowed; the response is returned to the caller so they aren't billed for a container they can't address. The DB row stays untracked until an operator reconciles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(guardrails): close post-call coverage gaps * fix(types): add /team/permissions_bulk_update to management_routes The blocklist check in _check_proxy_admin_viewer_access only fires for routes that match LiteLLMRoutes.management_routes — the bulk-update endpoint was missing from that list, so the test for view-only admins on /team/permissions_bulk_update fell through to "allow." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [Test] Anthropic Passthrough: Bump Thinking Tests Off Legacy Sonnet 4 Alias base_anthropic_messages_test.test_anthropic_messages_with_thinking and test_anthropic_streaming_with_thinking still pinned to claude-4-sonnet-20250514 — the same legacy alias Anthropic no longer recognizes under freshly issued keys. The other four tests in this base class already use claude-sonnet-4-5-20250929; these two were missed. Bump to claude-haiku-4-5-20251001 (supports_reasoning=true, no upcoming deprecation). Subclasses including TestAnthropicPassthroughBasic inherit these methods. * fix(guardrails): cover multi-choice output variants * fix(proxy): preserve public ai hub ui setting * fix(scim): cascade FK cleanup on user delete and surface block status in UI SCIM DELETE /Users/{id} previously called litellm_usertable.delete without clearing rows that FK back to the user, so Postgres rejected the delete with LiteLLM_InvitationLink_user_id_fkey and the SCIM caller saw a 500. Add a helper to drop invitation_link, organization_membership, and team_membership rows before the user delete (mirrors /user/delete in internal_user_endpoints). Also add a Status column to the Virtual Keys and Internal Users tables so admins can see at a glance which keys are blocked and which users SCIM has deactivated. SCIM-blocked keys carry a tooltip explaining the origin. Pin the dashboard's Node version to 20 via .nvmrc to match CI. * chore: update Next.js build artifacts (2026-05-02 03:21 UTC, node v20.20.2) * perf(proxy): cache container/skill ownership reads on the hot path Container ownership and skill rows are looked up on every retrieve / delete / list / file-content / chat-completion-with-skill call. The new stores wrapped raw Prisma queries with no cache, putting one DB round-trip on each request. Add an in-process TTL'd cache mirroring the _byok_cred_cache pattern in mcp_server/server.py: per-key (value, monotonic_timestamp), 60s TTL, 10000-entry cap with full-clear on overflow, invalidated by every write. Negative results (`None`) are cached too so untracked-resource checks also skip the DB. Tests cover: cache-after-first-hit, negative caching, write invalidation, no-caching-on-DB-error, TTL expiry, capacity eviction. 56 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: update Next.js build artifacts (2026-05-02 03:39 UTC, node v20.20.2) * fix: remove traceback key instead of it being "" * fix: linting error * fix(scim): preserve scim_active on PUT when client omits the field A SCIM PUT may legally omit `active` (full-replace with the field absent). Pydantic fills the SCIMUser.active default of True, so the PUT handler was overwriting metadata.scim_active with True even when the client never sent it — silently reactivating a previously SCIM-blocked user and unblocking their keys. Use model_fields_set to detect whether the client actually sent `active`. If omitted, preserve the prior scim_active value and skip the cascade to virtual keys. Also drop comments added in this PR that just narrate what the code does; keep only the docstrings and the SQL-NULL pitfall note that explain non-obvious behaviour. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(proxy): use set lookup for permitted agent filters * fix(mcp): redact command fields for non-admin server views * fix(proxy): forward decoded container ids after ownership checks * fix(caching): handle stale isolated Redis semantic index * fix(cloudflare): support response_text in streaming chunk parser Newer Cloudflare Workers AI models (e.g. Nemotron) emit 'response_text' instead of 'response' on streamed chunks. The non-streaming path was already updated to fall back to 'response_text' (#26385), but the streaming chunk parser still only read 'response', which caused streaming requests against those models to silently produce empty content. Mirror the non-streaming fallback in CloudflareChatResponseIterator.chunk_parser and add a streaming test for the response_text shape. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * Fix code qa * Address bugbot: drop dead encode/decode helpers; preserve empty custom_id - Remove unused _encode_gcp_label_value / _decode_gcp_label_value singular helpers; only the _chunks variants are actually called. - Use 'is not None' check for custom_id so empty-string custom_ids are still labeled and round-trip through batch outputs. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * Forward Vertex file content logging context * test vertex file content logging forwarding Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * Fix Vertex batch output logging mutation * fix: don't mutate caller's logging_obj in _try_transform_vertex_batch_output_to_openai The method was overwriting logging_obj.optional_params, logging_obj.model, and logging_obj.start_time on the caller's Logging instance. When invoked from llm_http_handler.py's generic framework path, the framework's own logging_obj (which already went through pre_call) had its properties clobbered, causing model and start_time to reflect the last batch line's values rather than the original call context. Fix: create a fresh local Logging instance for the per-line transformation instead of mutating the incoming logging_obj. The caller's object is now left entirely untouched regardless of whether a logging_obj was passed in or not. Regression tests added to verify model, start_time, and optional_params are not mutated on the caller's logging_obj. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * feat: add opt-out flag for Vertex batch output transformation Adds litellm.disable_vertex_batch_output_transformation (default False). When True, afile_content returns raw Vertex predictions.jsonl untouched so users that parse candidates/modelVersion directly are not broken. * fix(anthropic,bedrock): omit thinking/output_config when reasoning_effort="none" Setting reasoning_effort="none" on Anthropic chat models (direct, Bedrock Invoke, Bedrock Converse, Vertex AI Anthropic, Azure AI Anthropic) crashed LiteLLM with: litellm.APIConnectionError: 'NoneType' object has no attribute 'get' Both the Anthropic chat transformation and Bedrock Converse called ``AnthropicConfig._map_reasoning_effort`` and assigned the ``None`` it returns for ``"none"`` directly to ``optional_params["thinking"]``. Downstream ``is_thinking_enabled`` then did ``optional_params["thinking"].get("type")`` and crashed. Pop ``thinking`` (and on Claude 4.6/4.7, ``output_config``) instead of assigning ``None``, restoring the documented contract that ``reasoning_effort="none"`` means "do not enable thinking". This also prevents downstream Anthropic 400s ("thinking: Input should be an object", "output_config.effort: Input should be ...") if the bug were ever masked. Verified end-to-end against the live Anthropic API and Bedrock Converse on claude-opus-4-{5,6,7} and claude-sonnet-4-6, plus Bedrock Invoke for Claude 4.5/4.6. Vertex AI Anthropic and Azure AI Anthropic inherit the fixed ``map_openai_params`` from ``AnthropicConfig`` and need no further changes. * fix(vertex-ai): set response=null on batch error entries per OpenAI spec The Vertex batch output transformer was emitting both a populated 'response' and 'error' for failed batch entries. The OpenAI Batch output spec defines them as mutually exclusive: on error 'response' MUST be null. This broke any consumer using 'result["response"] is None' to detect failures. * test(vertex-ai): cover transformation_error path emits response=null * fix(security): sandbox jinja2 in gitlab/arize/bitbucket prompt managers DotpromptManager was hardened to render through ImmutableSandboxedEnvironment. The three sibling managers (gitlab, arize, bitbucket) were missed and still instantiate plain jinja2.Environment(), leaving the same attribute-traversal SSTI primitive open: a template fetched from a GitLab/BitBucket repo or Arize Phoenix workspace can reach __class__.__init__.__globals__ and execute arbitrary Python on the proxy host. Match the dotprompt pattern by switching all three to ImmutableSandboxedEnvironment. The sandbox blocks the dunder-traversal chain while leaving normal {{ var }} substitution intact, so the template surface is unchanged for legitimate use. Adds tests/test_litellm/integrations/test_prompt_manager_ssti.py (18 cases) verifying each manager's jinja_env is a sandbox, that classic SSTI payloads raise SecurityError, and that ordinary variable rendering still works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(proxy): drop client-supplied pricing fields from request bodies The proxy currently forwards request-body pricing parameters (the fields on `CustomPricingLiteLLMParams`, plus `metadata.model_info`) into the core call path. Those fields belong to deployment configuration, not to per-request input — sending them from a client mutates the request's recorded cost and, via `litellm.completion` → `register_model`, the process-wide `litellm.model_cost` map for every later caller in the worker. Strip them at the boundary. The strip set is built from `CustomPricingLiteLLMParams.model_fields` so pricing fields added later are covered automatically. Operators who do want clients to supply per-request pricing can opt back in per key or team via `metadata.allow_client_pricing_override = true`, mirroring the existing `allow_client_mock_response` and `allow_client_message_redaction_opt_out` flags. Tests cover the strip set's coverage, root and metadata strips, the opt-in skip on both key and team metadata, and a regression check that the global `litellm.model_cost` map is unmutated after a stripped request. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(proxy): log stripped pricing fields at debug for operator visibility Operators upgrading would otherwise see client-supplied pricing overrides silently stop applying with no diagnostic. Emit a debug-level line listing the dropped fields and pointing at the opt-in flag when any are stripped; stay silent on the no-op path so the log isn't filled with noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): move pricing strip below the litellm_metadata JSON-string parse The strip ran before the proxy parses ``litellm_metadata`` from a JSON string into a dict (a path used by multipart/form-data and ``extra_body`` callers), so ``isinstance(metadata, dict)`` was False and ``model_info`` survived the strip. Move the call to the same post-parse position the ``user_api_key_*`` strip already uses for the same reason. Adds a regression test exercising the JSON-string ``litellm_metadata`` path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(responses): replace legacy claude-4-sonnet alias in multiturn tool-call test Anthropic's main API no longer resolves the non-canonical 'claude-4-sonnet-20250514' alias for freshly issued keys, returning 404 not_found_error. PR #27031 already swept three other live tests pinned to this alias to claude-haiku-4-5-20251001 but missed test_multiturn_tool_calls in the responses API suite, which is now failing reliably on PR CI runs (e.g. PR #27074, job 1603363). Bump the two model references in test_multiturn_tool_calls to the same claude-haiku-4-5-20251001 snapshot used by PR #27031 -- it covers everything this test exercises (tool calling, multi-turn) and isn't on a deprecation schedule. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * chore(proxy): close callback-config and observability-credential side channels Two related gaps in the proxy's request bouncer: 1. ``is_request_body_safe`` (auth_utils.py) walked the request-body root and the ``litellm_embedding_config`` nested dict, but not ``metadata`` or ``litellm_metadata``. The same fields it bans at root — Langfuse / Langsmith / Arize / PostHog / Braintrust / Phoenix / W&B Weave / GCS / Humanloop / Lunary credentials and routing — were silently accepted when the caller put them inside metadata, retargeting observability callbacks to a caller-controlled host with caller-supplied creds. Walk both metadata containers (and parse the JSON-string form sent via multipart / ``extra_body``) through the same banned-params helper, so the existing ``allow_client_side_credentials`` opt-in covers both paths consistently. 2. The banned-params list was hand-maintained and lagged the canonical ``_supported_callback_params`` allow-list in ``initialize_dynamic_callback_params``. Derive the observability bans from that allow-list (minus a small ``_SAFE_CLIENT_CALLBACK_PARAMS`` set for informational fields like ``langfuse_prompt_version`` and ``langsmith_sampling_rate``) so future integrations are covered automatically; ``_EXTRA_BANNED_OBSERVABILITY_PARAMS`` carries the handful of fields integrations read but the allow-list hasn't caught up to. A guard test fails CI if a new entry is added to ``_supported_callback_params`` without an explicit safe-list decision. Separately in ``litellm_pre_call_utils.py``: add ``callbacks``, ``service_callback``, ``logger_fn``, and ``litellm_disabled_callbacks`` to ``_UNTRUSTED_ROOT_CONTROL_FIELDS``. The first three are appended to worker-wide ``litellm.{input,success,failure,_async_*,service}_callback`` lists / ``litellm.user_logger_fn`` from inside ``function_setup`` — one request poisons every subsequent caller in that worker. The last is the inverse primitive: the legitimate path reads it from key/team metadata, the request-body version silently disables admin-configured audit / observability for the call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(auth): per-param allow must continue, not return early A pre-existing logic bug in ``_check_banned_params``: when the deployment-level ``configurable_clientside_auth_params`` permitted one banned field, the loop ``return``-ed on the first match instead of ``continue``-ing, so any other banned param later in the same body or metadata dict was never checked. This PR's metadata walk multiplies the surface where that bypass matters — a body pairing an allowed ``api_base`` with an observability credential like ``langfuse_host`` would silently pass. Proxy-wide ``allow_client_side_credentials`` keeps ``return`` (it's a global opt-in for every banned param). The per-param branch becomes ``continue`` so only the one explicitly-permitted field is skipped. Adds a regression test that exercises the api_base + langfuse_host pair. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(vector_store): resolve embedding config at request time, never persist creds The vector store create/update path previously called ``_resolve_embedding_config`` against the admin-configured router/DB model and persisted the resolved ``litellm_embedding_config`` dict (``api_key`` / ``api_base`` / ``api_version``) into the ``litellm_managedvectorstorestable.litellm_params`` column. Because the resolver expanded ``os.environ/...`` references via ``get_secret``, the DB row carried cleartext provider credentials, and the ``/vector_store/{new,info,update,list}`` responses returned them to any authenticated caller who could supply a known admin model name. Move the auto-resolve out of ``create_vector_store_in_db`` and out of the update path. Persist only the user-supplied ``litellm_embedding_model`` reference. Resolve at request-handling time inside ``_update_request_data_with_litellm_managed_vector_store_registry`` so the resolved config lives in the per-request ``data`` dict and is garbage-collected after the response. Legacy rows that were created by an earlier proxy version and already carry a resolved ``litellm_embedding_config`` skip the re-resolution and pass through unchanged so embedding calls keep working. The ``new_vector_store`` response now also runs the existing ``_redact_sensitive_litellm_params`` masker (already used by ``info``, ``update``, and ``list``), defending against caller-supplied cleartext on the create path and against legacy rows whose persisted credentials are still in the database. Existing tests that asserted the old write-time-resolve behaviour are updated to assert the new persistence shape (no embedding config stored, just the model reference). Two new tests cover the use-time path: one asserting fresh resolution happens when a row carries only the model reference, the other asserting legacy rows with persisted config skip re-resolution and continue to work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(vector_store): tighten registry-mutation comment and dedupe test helpers * fix(vector_store): cache use-time embedding-config resolution Hold the resolved config in a process-memory TTL cache so the request-handling path doesn't run litellm_proxymodeltable.find_first on every vector-store call. * fix(anthropic,bedrock,vertex): forward output_config.effort + 400 on garbage reasoning_effort Follow-up bugs surfaced by the QA sweep on PR #27039 (https://github.com/BerriAI/litellm/pull/27039#issuecomment-4363363610). 1. Stop stripping output_config.effort on Bedrock + Vertex adaptive routes. - Vertex AI Claude 4.6/4.7 accepts output_config.effort on rawPredict (verified end-to-end against us-east5 / global). The strip helper now no-ops for effort. - Bedrock Converse routes output_config into additionalModelRequestFields for anthropic base models so the requested adaptive tier (low/medium/ high/xhigh/max) actually reaches the wire instead of all collapsing to identical thinking. - Bedrock Invoke chat transformation (AmazonAnthropicClaudeConfig) stops popping output_config from the post-AnthropicConfig request body. - Bedrock Invoke /v1/messages allowlist (BedrockInvokeAnthropicMessagesRequest) now lists output_config so the runtime allowlist filter forwards it. 2. Validate effort across Bedrock Converse so 'disabled' / 'invalid' / '' / unsupported tiers (xhigh/max on Sonnet 4.6 or budget-mode 4.5 models) surface as a clean 400 BadRequestError instead of 500. 3. ValueError -> BadRequestError throughout (AnthropicConfig.map_openai_params, _apply_output_config, AmazonConverseConfig._handle_reasoning_effort_parameter). Empty-string effort is now rejected (was silently passing the 'if effort and ...' short-circuit). 4. Floor reasoning_effort='minimal' at the Anthropic provider minimum (1024 budget_tokens) via new ANTHROPIC_MIN_THINKING_BUDGET_TOKENS so it's a usable tier on direct Anthropic / Azure AI Anthropic / Vertex AI Anthropic / Bedrock Invoke (all of which 400 below 1024). 5. model_prices: dedupe duplicate supports_max_reasoning_effort key on claude-opus-4-7 / claude-opus-4-7-20260416. Adds regression tests across all five affected paths; existing tests asserting the silent-strip behavior were updated to reflect the new pass-through and clean 400 surfaces. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(constants): make ANTHROPIC_MIN_THINKING_BUDGET_TOKENS a plain constant The documentation CI test (tests/documentation_tests/test_env_keys.py) asserts every os.getenv() key in the source has a matching entry in the litellm-docs config_settings.md table. ANTHROPIC_MIN_THINKING_BUDGET_TOKENS tracks Anthropic's published wire-protocol minimum (1024) — it's not a user-tunable, so making it env-overridable was wrong anyway. Drop the os.getenv() wrapper; the value is now a plain literal. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(anthropic,bedrock): correct effort error message and dedupe effort_map - Remove 'none' from the Bedrock _validate_anthropic_adaptive_effort error message; it was listed as a valid value but rejected by the membership check, leaving users in a feedback loop if they tried 'none'. - Hoist the duplicated reasoning_effort -> output_config.effort mapping out of AnthropicConfig.map_openai_params and AmazonConverseConfig._handle_reasoning_effort_parameter into a single AnthropicConfig.REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT class constant so the two routes cannot drift. * fix(anthropic): translate reasoning_effort on /v1/messages route Closes the remaining QA-sweep gap on PR #27074: Bedrock Invoke /v1/messages was silently ignoring ``reasoning_effort`` because the shared param filter only kept native Anthropic keys, so every effort tier collapsed to the same behavior on the wire (27/231 cells failing across opus-4-5 / opus-4-6 / sonnet-4-6). Map ``reasoning_effort`` to native Anthropic ``thinking`` / ``output_config.effort`` at the ``AnthropicMessagesConfig`` layer so all four /v1/messages routes (direct Anthropic, Azure AI, Vertex AI, Bedrock Invoke) inherit the same translation: - Add ``reasoning_effort`` to ``AnthropicMessagesRequestOptionalParams`` so the param filter in ``AnthropicMessagesRequestUtils.get_requested_anthropic_messages_optional_param`` no longer drops it before the transformation runs. - Add ``_translate_reasoning_effort_to_anthropic`` and call it from ``transform_anthropic_messages_request``. Mirrors ``AnthropicConfig.map_openai_params`` on the chat completion path (re-uses ``_map_reasoning_effort`` and ``REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT``) so the two routes cannot drift. Pops ``reasoning_effort`` so it never reaches the wire. - Caller-supplied native ``thinking`` / ``output_config.effort`` always win — same precedence as ``_translate_legacy_thinking_for_adaptive_model``. - Garbage values (``""``, ``"disabled"``, ``"invalid"``) raise ``AnthropicError(status_code=400)`` instead of falling through and surfacing as 500s from the provider. - ``"none"`` clears thinking + output_config so callers can opt out per request. Also restores the non-adaptive-model test coverage on Bedrock Invoke /v1/messages that the previous commit lost when ``test_bedrock_messages_strips_output_config`` was renamed to the ``forwards`` variant on Opus 4.7. Adds a new test file ``test_reasoning_effort_translation.py`` covering the translation at the shared config level (adaptive + non-adaptive models, none, garbage, caller precedence) so all four /v1/messages routes are exercised by a single suite. Adds parametrized + behavioral tests on the Bedrock Invoke /v1/messages suite covering: minimal/low/medium/high/xhigh/max mapping for adaptive models, thinking-budget mapping for non-adaptive Opus 4.5, ``none`` clears both, garbage raises 400, explicit ``output_config`` wins. Refs: https://github.com/BerriAI/litellm/pull/27074 * fix(anthropic,bedrock): reject unmapped reasoning_effort at mapping site Both the chat completion path (AnthropicConfig.map_openai_params) and the Bedrock Converse path (_handle_reasoning_effort_parameter) used REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get(value, value) which falls back to the raw input on unmapped keys. Combined with _map_reasoning_effort returning type='adaptive' for any string on Claude 4.6/4.7, garbage values (e.g. 'disabled') could leak into optional_params['output_config']['effort'] unvalidated if map_openai_params ran without the downstream transform_request or _validate_anthropic_adaptive_effort check. Mirror the /v1/messages pattern: use .get(value) (no fallback) and raise BadRequestError immediately when the value is unmapped, co-locating validation with the mapping for defense in depth. * style: black formatting Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(anthropic): stop class-attr leak; gate xhigh/max on every route The reasoning-effort mapping dict was a public class attribute on AnthropicConfig, so BaseConfig.get_config returned it as a request parameter and every Anthropic-backed call (Anthropic / Azure / Vertex / Bedrock Invoke) hit a 400 'REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT: Extra inputs are not permitted' from the provider. Move the mapping to a module-level constant. _supports_effort_level only looked the model up under custom_llm_provider='anthropic', so bedrock-prefixed model ids (e.g. bedrock/invoke/us.anthropic.claude-opus-4-7) returned False for both 'max' and 'xhigh' even when the underlying model entry has the flag set. Strip known provider prefixes and retry the lookup against litellm.model_cost directly so per-model gating works on every route. Mirror the per-model xhigh/max gate from AnthropicConfig._apply_output_config in AnthropicMessagesConfig._translate_reasoning_effort_to_anthropic so the /v1/messages route also raises a clean 400 instead of forwarding the unsupported tier. * feat(anthropic,bedrock): strip output_config under drop_params for non-effort models When a proxy fronts Claude Code (which always sends `output_config.effort`) at a pre-4.5 Anthropic model — haiku-3, sonnet-3.5, opus-3, sonnet-4 — the forwarded knob causes a forced 400 the client can't fix. Gating a strip behind the existing `drop_params` flag lets operators opt into silent fixup once and stop worrying about per-model param hygiene. Default (`drop_params=False`) still forwards and surfaces the provider's error, preserving the strict, debuggable contract from #27074. Per https://platform.claude.com/docs/en/build-with-claude/effort the supporting set is Opus 4.5+, Sonnet 4.6+, and Mythos Preview; everything else is dropped (with a verbose_logger warning so the strip is visible). Recognition uses model-name patterns plus a fallback to any `supports_*_reasoning_effort` flag in the model map for forward compatibility with new entries. https://claude.ai/code/session_01WjHq31rvXT6xYNdVmSJvRp (cherry picked from commit 1233943e7861ba8a9062f792310ebd401cb03db8) * fix(base_llm): filter all _-prefixed class attrs from get_config The drop_params strip work added `AnthropicConfig._EFFORT_SUPPORTING_MODEL_PATTERNS` as a private class-level lookup tuple. `BaseConfig.get_config()` only filtered the `__`-prefixed names plus `_abc` / `_is_base_class`, so `_EFFORT_SUPPORTING_MODEL_PATTERNS` would have leaked into the request body the same way `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT` did before the previous commit. Generalize the existing `_abc` / `_is_base_class` carve-outs to skip every `_`-prefixed name. `AmazonConverseConfig.get_config()` overrides the base method, so apply the same change there. Also unblocks future internal helpers from accidentally serialising into the wire body. * fix(anthropic): drive output_config.effort support from model map flags Replace hardcoded _EFFORT_SUPPORTING_MODEL_PATTERNS with a JSON-backed check that uses supports_*_reasoning_effort flags from the model map. Add supports_minimal_reasoning_effort: true to opus-4-5 and mythos-preview entries (which previously only carried supports_reasoning) so the JSON remains the single source of truth for effort capability. * fix(anthropic,bedrock,databricks): four reasoning_effort follow-ups - claude-sonnet-4-6 + reasoning_effort=max no longer 400s. Renamed _is_opus_4_6_model to _is_claude_4_6_model at three sites and added supports_max_reasoning_effort: true to 12 model entries in the JSON cost map (10 sonnet 4.6 ids + OpenRouter opus 4.6/4.7). - _map_reasoning_effort now raises BadRequestError(400) directly with llm_provider, instead of letting Databricks (and similar callers) surface its raw ValueError as a 500. - output_config.effort on Opus 4.5 over Bedrock no longer 400s for missing effort-2025-11-24 beta. Flipped JSON to "effort-2025-11-24" for bedrock + bedrock_converse and added an auto-attach branch in _process_tools_and_beta for non-adaptive Anthropic + output_config on Converse. - reasoning_effort=xhigh / =max on legacy budget-mode models (Haiku 4.5, Sonnet 4.5, Opus 4.5) now map to thinking.budget_tokens 8192 / 16384 instead of returning 400. Added two constants in litellm/constants.py. Tests updated for all four flips. Validated end-to-end via 306-cell live proxy matrix (6 model families x 3 routes x 17 effort cases), all pass. * fix(databricks): validate reasoning_effort and set output_config on adaptive Claude The Databricks path called `AnthropicConfig._map_reasoning_effort` for Claude models but never validated the effort string nor set `output_config.effort` for adaptive models (Claude 4.6/4.7). Since `_map_reasoning_effort` returns `type=adaptive` for ANY non-None / non-"none" string on adaptive models (including "disabled", "invalid", ""), Databricks silently accepted garbage and emitted a request without an `output_config.effort`, collapsing every adaptive tier to identical behavior. Match the Anthropic native, Bedrock Converse, Bedrock Invoke, and /v1/messages paths: when the resolved `thinking` is non-None on a 4.6/4.7 model, look up the value in `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT` and either raise a clean `BadRequestError` or set `optional_params["output_config"]`. * fix(azure): omit model from image generation and image edit deployment requests Azure OpenAI routes image gen/edit by deployment in the URL; sending the deployment id in model breaks gpt-image-2 (invalid_value). Strip model from JSON for deployments/.../images/generations and from multipart data for .../images/edits. Non-deployment URLs (e.g. Azure AI FLUX) unchanged. Fixes #26316. Co-authored-by: Cursor <cursoragent@cursor.com> * test(azure): exercise image gen JSON filter via HTTP client; dedupe image edit URL - Image generation tests patch HTTPHandler.post / get_async_httpx_client so make_*_azure_httpx_request runs and wire json is asserted on call kwargs. - Azure image edit: strip model in finalize_image_edit_multipart_data using the same URL string the handler passes to POST (no second get_complete_url in transform). BaseImageEditConfig default finalize is a no-op. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(azure_ai/anthropic): promote output_config out of extra_body so validation runs `azure_ai` is registered in `litellm.openai_compatible_providers`, so `add_provider_specific_params_to_optional_params` (litellm/utils.py) auto-stuffs any non-OpenAI kwarg (e.g. `output_config={"effort": "..."}`) into `optional_params["extra_body"]`. `AzureAnthropicConfig.transform_request` then strips `extra_body` entirely on the way out, silently dropping the param — and `AnthropicConfig._apply_output_config` never sees it, so `effort="invalid"` / `effort="xhigh"` on a non-supporting model quietly reaches the model with default behavior instead of returning a clean 400 (as the native `anthropic` provider does). Promote the keys back to top-level `optional_params` (using `setdefault` so explicit top-level values win) before delegating to the parent `AnthropicConfig`. Apply in both `validate_environment` and `transform_request` so flag detection (`is_mcp_server_used`, etc.) and output-config validation both run. Surfaced by the QA matrix expansion on PR #27074: 20 cells where Azure returned 200 while `anthropic` returned 400 — all `output_config` mode across haiku_4_5, sonnet_4_5, opus_4_5, sonnet_4_6, opus_4_6, opus_4_7 families with `effort` in {invalid, xhigh, max, low, medium, high}. Tests: * `test_output_config_promoted_from_extra_body`: valid effort reaches data * `test_invalid_output_config_effort_raises_via_extra_body`: 400 on bad effort * `test_unsupported_effort_xhigh_raises_via_extra_body`: 400 on xhigh-on-Sonnet-4.6 * `test_extra_body_promotion_does_not_clobber_top_level`: setdefault semantics * test(image_gen): expect no model in Azure image edit multipart (#26316) Align test_azure_image_edit_litellm_sdk with deployment-scoped Azure edits. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(anthropic): extract _validate_effort_for_model to prevent drift The chat completion path (`_apply_output_config`) and the /v1/messages pass-through (`AnthropicMessagesConfig._translate_reasoning_effort_to_anthropic`) both gate `max` / `xhigh` per model. The two sites had diverged from near-identical copies into separately maintained blocks, creating a real drift risk when a new model tier (e.g. Claude 4.8) lands -- a contributor could update one site and miss the other. Centralise the gating in `AnthropicConfig._validate_effort_for_model`, which returns an error message string or `None`. Each call site keeps its own provider-appropriate exception type (`BadRequestError` for the chat path, `AnthropicError` for the /v1/messages pass-through) but the gating decision now comes from one place. Net -11 LOC. Adds a parametrised unit test exercising the helper directly across 4.5 / 4.6 / 4.7 model families and `max` / `xhigh` / lower-effort inputs. Existing tests at both call sites continue to pass unchanged. Addresses Greptile finding on PR #27074. * fix(databricks): narrow reasoning_effort_value to str for mypy `non_default_params.get("reasoning_effort")` returns `Any | None`, but `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get()` expects `str`. Mypy flagged this on the strict pass. Narrow with `isinstance` before the lookup; non-strings fall through to the existing `BadRequestError` below with a clean validation message, so behavior is unchanged. Fixes a regression introduced by |
||
|
|
4af58e1f97
|
[Security] Clear AWS Inspector CVE findings on Docker image
- Narrow /root/.cache COPY in Dockerfile to /root/.cache/prisma{,-python}
only — drops ~660MB of uv build cache including a setuptools wheel
that surfaced as CVE-2024-6345 / CVE-2025-47273 even though it was
never on the runtime sys.path.
- DiskCache: switch to dc.JSONDisk to neutralize the pickle code path
(CVE-2025-69872, no upstream fix). Values must be JSON-serializable;
cleanup get_cache to skip the now-dead json.loads(dict) branch by
guarding on isinstance(str).
- pyproject.toml: drop diskcache pin from [caching] extra (no fixed
version exists). Stub kept so `pip install litellm[caching]` doesn't
warn; users who want disk caching install diskcache themselves.
- Bump black 24.10.0 → 26.3.1 (CVE-2026-32274) + apply 296-file mechanical
reformat. Black is dev-only (not in the runtime image), but bumping
clears the manifest-scan finding.
- Refresh ui/litellm-dashboard/package-lock.json to pick up next 16.2.4
(was 16.1.7, GHSA-q4gf-8mx6-v5v3), uuid 14.0.0, postcss 8.5.13.
- Refresh litellm-js/spend-logs/package-lock.json to pick up
hono 4.12.16 (GHSA-458j-xx4x-4375).
- uv lock: gitpython 3.1.46 → 3.1.49 (clears two High GHSAs),
langchain-text-splitters 1.1.1 → 1.1.2.
- Add tests/test_litellm/caching/test_disk_cache.py covering JSONDisk
enforcement, dict/string round-trip, TTL, increment, delete/flush.
Net delta on combined trivy + grype scans: 17 findings → 4 (all
remaining 4 are Wolfi system python-3.13 CVEs marked WONTFIX upstream
in CPython 3.14; CVE-2026-3298 is Windows-unreachable on Linux).
Existing on-disk caches written by the previous pickle-format Disk
will silently miss after upgrade — diskcache is intended to be
ephemeral so impact is recreate-on-next-write.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1dcfc36393 | chore(deps): align dashboard node engine | ||
|
|
bfdd786962 | chore(deps): refresh dependency locks | ||
|
|
8d1493ed08
|
fix(security): bump vulnerable dependencies
pip: - cryptography 43.0.3 → 46.0.7 (5 CVEs including CVSS 8.2 ECDH key leak) npm: - hono 4.1.4/4.12.7 → 4.12.12 (prototype pollution, cookie injection, path traversal, middleware bypass, IP matching bypass) - @hono/node-server 1.19.6 → 1.19.13 (serveStatic middleware bypass) - vite 7.3.1 → 7.3.2 (file read via WebSocket, path traversal, fs.deny bypass) - lodash override 4.17.23 → 4.18.1 (code injection via _.template, prototype pollution via _.unset/_.omit) mlflow left at 3.9.0 — 2 of 3 alerts have no upstream fix, and 3.11.1 is blocked by exclude-newer (transitive dep chain). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
006d481025
|
[Fix] Remove neon CLI dependency and pin all JS dependencies
Remove @neondatabase/api-client and neonctl to address CVE-2026-25639 (axios supply chain vulnerability). Pin all JS dependencies to exact versions across all package.json files to prevent future supply chain attacks via semver range resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
62835ff03d | adding package-lock | ||
|
|
f5e5d17e4a
|
fix(mcp): fix OpenAPI OAuth flow — transport mapping, error messages, and discovery bypass (#23315)
* fix(mcp): fix OpenAPI OAuth flow — transport mapping, error messages, and discovery bypass
Three bugs fixed to make the end-to-end OAuth flow work for OpenAPI MCP servers:
1. **Transport mapping in getTemporaryPayload**: `TRANSPORT.OPENAPI` is a UI-only concept;
the backend only accepts `"http"`, `"sse"`, or `"stdio"`. The pre-OAuth temp-session
call was sending `transport: "openapi"` and getting a 422. Fixed by mapping to `"http"`.
2. **deriveErrorMessage handles FastAPI 422 arrays**: FastAPI validation errors return
`detail` as an array of `{loc, msg, type}` objects. The shared error extractor was
returning the array directly, causing `Error: [object Object]`. Fixed to map each
item to its `.msg` field.
3. **Skip OAuth discovery when authorization_url already provided**: `build_mcp_server_from_table`
was unconditionally calling `_descovery_metadata(server_url)` for OAuth servers. For
OpenAPI servers the url is the spec JSON file, not the API base — this caused a timeout
fetching e.g. the GitHub spec (2 MB). Fixed by skipping discovery when `authorization_url`
is already set.
Also: collapsible auth section in MCP server form, "Create OAuth App →" link next to
Client ID when a docs URL is available (e.g. GitHub OAuth App creation page), and
`extractErrorMessage` helper in `useMcpOAuthFlow` for cleaner error display.
* refactor(mcp): extract needs_discovery flag and reduceStaticHeaders helper
* feat(mcp): user OAuth connect flow — OAuthConnectModal, MCPCredentialsTab, useUserMcpOAuthFlow
Adds the user-facing MCP OAuth2 PKCE connect flow:
- OAuthConnectModal: modal that launches the PKCE flow for a user to connect to an MCP server
- MCPCredentialsTab: credentials management tab in the MCP apps panel
- useUserMcpOAuthFlow: hook that handles the full PKCE auth code exchange for user-level connections
- MCPAppsPanel: wires up the new credentials tab and connect modal
- ChatPage: further cleanup after responses-API revert
- db.py / mcp_management_endpoints.py / _types.py: backend support for storing user MCP credentials
* fix(mcp): make client_id optional in /authorize — use server's stored client_id when not provided
* address greptile review feedback
* fix(mcp): narrow bare except to RecordNotFoundError in BYOK credential delete
* refactor(mcp): move inline imports to module level in db.py
* docs(claude): add MCP OAuth, transport mapping, and browser storage patterns
* fix(security): remove accessToken from sessionStorage in OAuth flow state
The LiteLLM API key was being serialised into sessionStorage as part of
StoredFlowState. After the OAuth redirect the component re-mounts with the
same accessToken prop, so it never needed to be stored. Read it from props
in resumeOAuthFlow instead.
* fix(ui): remove duplicate extractErrorMessage, sessionStorage-only in admin OAuth hook, call delete API on disconnect
* fix(ui): guard resumeOAuthFlow against wrong hook instance consuming OAuth result
* fix(ui): separate OAuth result keys per flow, sessionStorage-only, surface revoke errors
* fix(ui): remove dead OAuthConnectModal, revert tsconfig jsx mode to preserve
* fix(mcp): guard BYOK overwrite in oauth credential store, raise clear error when client_id absent
* fix: forward OAuth error params in callback, fix BYOK guard exception handling in db.py
|
||
|
|
03ca98123f
|
Agents health checks (#23044)
* feat: add health check toggle to agents page Backend: - Add health_check query parameter to GET /v1/agents endpoint - When health_check=true, performs concurrent GET requests to each agent's URL and filters out agents with unreachable URLs (5s timeout) - Agents returning HTTP <500 are considered healthy; 5xx and connection errors mark agents as unhealthy UI: - Add Health Check toggle (Switch) to agents panel header - Toggle triggers re-fetch with health_check=true, filtering the agent list - Icon color changes (green/gray) to indicate toggle state - Tooltip explains behavior: 'only agents with reachable URLs are shown' Networking: - Update getAgentsList to accept optional healthCheck boolean parameter Tests: - Backend: 9 new tests covering health check filtering, _check_agent_url_health helper (no URL, 200, 404, 500, connection error cases) - UI: 3 new tests verifying toggle renders, initial fetch without health check, and fetch with health check after toggle click Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix: fix greptile comment re: security issue * fix: fix based on greptile feedback * fix: align health check tests with implementation - Rename test_should_return_unhealthy_when_no_url to test_should_return_healthy_when_no_url (implementation returns healthy=True for agents without a URL) - Patch get_async_httpx_client instead of httpx.AsyncClient so mocks actually intercept the HTTP calls made by _check_agent_url_health - Remove unnecessary __aenter__/__aexit__ context-manager mocks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: undo _experimental/out renames from cherry-pick Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update litellm/proxy/agent_endpoints/endpoints.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> |
||
|
|
cfd0e2cf99
|
[Feat] UI Polish - MCP Servers page - show transport type (#23051)
* Update AGENTS.md with additional Cursor Cloud setup notes - Add note about openapi-core dependency needed for OpenAPI compliance tests - Add note about poetry lock fallback when lock file is out of sync Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Sync lock files with current dependency specs - poetry.lock: regenerated to match pyproject.toml (litellm-proxy-extras 0.4.50 -> 0.4.51) - package-lock.json: updated from npm install Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Polish MCP Servers UI for enterprise-grade look and feel 10 improvements to the MCP Servers table and related components: 1. Remove debug console.logs from mcp_servers.tsx 2. Fix health status icons: distinct ✓/✗/? per state instead of identical dots 3. Health status badges: proper pill styling with rounded-full and borders 4. Health loading state: subtle pulsing dot instead of raw SVG spinner 5. Transport column: color-coded badges (HTTP=blue, SSE=purple, STDIO=amber, OPENAPI=teal) 6. Auth type column: color-coded badges (oauth2=indigo, bearer_token=sky, api_key=emerald) 7. Server ID chip: rounded corners, border, and transition effect 8. Filter bar: lighter border, cleaner labels, vertical divider between filters 9. Network Access: pill badges with colored dots (Public/Internal) 10. Date columns: shorter headers, dash for missing values, tooltip with full datetime Also: - Improved delete modal: cleaner layout, neutral background instead of red - Access Groups column: shows first group with +N count instead of truncated text - Empty state message includes CTA guidance - Updated test to match renamed filter label Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Polish MCP server detail views and table refinements (round 2) 10 more enterprise polish improvements: 1. Overview cards: use color-coded badges for Transport and Auth Type values 2. Overview cards: fix 'Host Url' typo -> 'Host URL', uppercase card labels 3. Settings tab: show em-dash placeholder for empty/missing values 4. Settings tab: use consistent Transport/Auth/Network badge styling matching table 5. Settings tab: definition-list layout with label/value grid columns 6. Server detail header: show server name prominently with alias as badge 7. Server detail header: show description below name, smaller server ID 8. Actions column: improved hover states with background color transitions 9. Credential column: pill badge for Connected state, shadow on Connect button 10. Table header: server count badge next to title, CTA button moved right Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Revert colorful transport/auth badges to neutral gray Color should only carry semantic meaning. Transport type (HTTP/SSE) and auth type (oauth2/bearer_token) are informational labels, not status indicators, so they use a uniform gray badge. Color remains on: - Health status: green (healthy), red (unhealthy) - Network access: green (public), orange (internal) - Credential: green (connected) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> |
||
|
|
83e237bac2
|
fix(chat): fix /ui/chat routing, Suspense boundary, serverRootPath support (#22945)
* fix(chat): fix router.push paths to use /ui/chat with serverRootPath support * fix(chat): wrap chat page in Suspense boundary for Next.js static export * fix(chat): fix clipboard writeText rejection handler - remove undefined message.error call * feat(chat): rebuild UI with routing fixes * fix(chat): use useTheme logoUrl + /get_image fallback for sidebar logo * feat(chat): rebuild UI with logo fix * fix(chat): use /get_image directly for logo (no ThemeProvider outside dashboard layout) * feat(chat): add multi-model comparison and provider logos in chat UI - Replace single model selector with multi-select (up to 3 models) - Show provider logos next to model names in dropdown (openai, anthropic, gemini, mistral, groq, etc.) - Selected models float to the top of the dropdown list - Multi-model mode: responses stream in parallel side-by-side cards below each user message - Multi-turn: each follow-up message carries full per-model history as context - Surface API errors inline in response cards instead of silently swallowing them - Rebuild UI |
||
|
|
ec600aa70a
|
feat(ui): add Chat UI — ChatGPT-like interface with MCP tools and streaming (#22937)
* feat(ui): add chat message and conversation types * feat(ui): add useChatHistory hook for localStorage-backed conversations * feat(ui): add ConversationList sidebar component * feat(ui): add MCPConnectPicker for attaching MCP servers to chat * feat(ui): add ModelSelector dropdown for chat * feat(ui): add ChatInputBar with MCP tool attachment support * feat(ui): add MCPAppsPanel with list/detail view for MCP servers * feat(ui): add ChatMessages component; remove auto-scrollIntoView that caused scroll-lock bypass * feat(ui): add ChatPage — ChatGPT-like UI with scroll lock, MCP tools, streaming * feat(ui): add /chat route wired to ChatPage * feat(ui): remove chat from leftnav — chat accessible via navbar button * feat(ui): add Chat button to top navbar * feat(ui): add dismissible Chat UI announcement banner to Playground page * feat(proxy): add Chat UI link to Swagger description * feat(ui): add react-markdown and syntax-highlighter deps for chat UI * fix(ui): replace missing BorderOutlined import with inline stop icon div * fix(ui): apply remark-gfm plugin to ReactMarkdown for GFM support * fix(ui): remove unused isEvenRow variable in MCPAppsPanel * fix(ui): add ellipsis when truncating conversation title * fix(ui): wire search button to chats view; remove non-functional keyboard hint * fix(ui): use serverRootPath in navbar chat link for sub-path deployments * fix(ui): remove unused ChatInputBar and ModelSelector files * fix(ui): correct grid bottom-border condition for odd server count * fix(chat): move localStorage writes out of setConversations updater (React purity) * fix(chat): fix stale closure in handleEditAndResend - compute history before async state update * fix(chat): fix 4 issues in ChatMessages - array redaction, clipboard error, inline detection, remove unused ref |
||
|
|
dd183a7fcb
|
[Feat] UI - Allow sorting MCPs by created_at, Display name date (#22825)
* Add column sorting to MCP servers table - Added sorting state management to DataTable component - Enabled getSortedRowModel for tanstack/react-table - Made column headers clickable with sort indicators (↑↓⇅) - Added enableSorting: true to sortable columns in mcp_server_columns - Columns now support ascending/descending sort by clicking headers - Updated package-lock.json and tsconfig.json from build process Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Make table sorting opt-in to avoid conflicts with existing consumers Address Greptile feedback (score 2/5): - Added enableSorting prop to DataTable (defaults to false) - Only enable sorting features when explicitly requested - Pass enableSorting=true from MCP servers component - This prevents unintended sorting on other DataTable consumers: * view_logs (has server-side sorting) * pass_through_settings * UsagePage - Sorting UI (indicators, click handlers) only shown when enabled Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> |
||
|
|
67f90254ed
|
feat(guardrails): team-based guardrail registration and approval workflow (#22459)
* feat(guardrails): team-based guardrail registration and approval workflow Add team-based guardrail submission system where teams can register Generic Guardrail API guardrails for admin review. Includes: - POST /guardrails/register endpoint for team-scoped submissions - Admin review endpoints (list/get/approve/reject submissions) - Team Guardrails tab in the UI dashboard - extra_headers support for forwarding client headers to guardrail APIs - Prisma schema migration for status, submitted_at, reviewed_at fields - Documentation for team-based guardrails and static/dynamic headers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(guardrails): address review feedback - SSRF, silent failure, redundant query - Validate api_base URL scheme (http/https only) and hostname in register_guardrail to prevent SSRF via team submissions - Return warning field in approve response when in-memory initialization fails so admins know the guardrail won't work until next sync cycle - Eliminate redundant DB query in list_guardrail_submissions by fetching all team guardrails once and deriving both filtered list and summary counts from the single result set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(guardrails): add pending_review status guard to reject endpoint Prevent rejecting already-active or already-rejected guardrails, which would create a DB/memory inconsistency (active in memory but rejected in DB). Now mirrors the approve endpoint's status check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
29e3fd5d79
|
[Release Fix] (#22411)
* fix(lint): suppress PLR0915 for 3 complex methods that exceed 50-statement limit - streaming_iterator.py: _process_event (84 statements) - transformation.py: translate_messages_to_responses_input (51 statements) - transformation.py: transform_realtime_response (54 statements) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(mypy): resolve type errors in public_endpoints, user_api_key_auth, common_utils, transformation - public_endpoints.py: fix _cached_endpoints type annotation - user_api_key_auth.py: accept Optional[str] for end_user_id parameter - common_utils.py: add NewProjectRequest/UpdateProjectRequest to Union type - transformation.py: add ChatCompletionRedactedThinkingBlock and list[Any] to content type Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(proxy-extras): bump version to 0.4.50 and sync schema - Bump litellm-proxy-extras from 0.4.49 to 0.4.50 - Sync schema.prisma with main proxy schema - Includes new LiteLLM_ClaudeCodePluginTable model - Includes new @@index([startTime, request_id]) on SpendLogs - Update version references in requirements.txt and pyproject.toml Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(router): use string id in test_add_deployment and add defensive str() in register_model - Change test to use string '100' instead of int 100 for model_info.id - Add str() conversion in register_model to prevent AttributeError on non-string keys Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(security): update minimatch to 10.2.4 to fix CVE-2026-27903 and CVE-2026-27904 - Run npm audit fix in docs/my-website - Updates minimatch from 10.2.1 to 10.2.4 (fixes HIGH severity ReDoS vulnerabilities) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): update realtime guardrail test assertions to match actual guardrail behavior - test_text_message_blocked_by_guardrail_no_ai_response: allow guardrail's own block message text in response.done (previously expected empty content) - test_voice_transcript_blocked_by_guardrail: allow guardrail to send response.cancel + block message + response.create flow (previously expected no response.create) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: revert proxy-extras version in requirements.txt and pyproject.toml The litellm-proxy-extras 0.4.50 is not published to PyPI yet, so consumer references must stay at 0.4.49. Only the source package pyproject.toml should be bumped to 0.4.50 for the publish_proxy_extras CI job. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: make transcript delta check optional in voice guardrail test The guardrail sends an error event (guardrail_violation) when blocking voice transcripts; it does not always produce transcript deltas. Remove the assertion requiring response.audio_transcript.delta since the error event is the primary signal that blocked content was handled. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add missing env keys to documentation: LITELLM_MAX_STREAMING_DURATION_SECONDS and LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES These two environment variables were used in code but not documented in the environment variables reference section of config_settings.md, causing the test_env_keys.py CI test to fail. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Fix 13 mypy type errors across 6 files - in_flight_requests_middleware.py: Fix type: ignore error codes from [union-attr] to [attr-defined], add [arg-type] for Gauge **kwargs - transformation.py: Add [assignment] ignore for output_format reassignment, add fallback empty string for tool use id to fix arg-type - responses/main.py: Remove redundant type annotation on second secret_fields assignment to fix no-redef - streaming_iterator.py: Add [assignment] ignores for intermediate cache token assignments - handler.py: Add [typeddict-item] ignore for AnthropicMessagesRequest construction from dict - public_endpoints.py: Add [arg-type] ignore for _load_endpoints() return type mismatch with SupportedEndpoint model Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: add auth overrides to spend tracking tests, fix realtime guardrail assertion, update UI minimatch - Add app.dependency_overrides for user_api_key_auth in 4 spend tracking tests that were returning 401 Unauthorized (error_code, error_message, error_code_and_key_alias, key_hash) - Fix realtime guardrail test to check ANY error event for guardrail_violation instead of just the first (OpenAI may send its own errors first) - Update ui/litellm-dashboard/package-lock.json to fix minimatch vulnerability Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Fix failing MCP e2e and create_mcp_server UI tests Test 1 (test_independent_clients_no_shared_session): - Add allow_all_keys: true to MCP servers in test config. With master_key and no DB, get_allowed_mcp_servers returned empty, causing 0 tools and 403 on tool calls. allow_all_keys bypasses per-key restrictions. - Add asyncio.sleep(0.5) between client connections to allow MCP SDK TaskGroup cleanup and avoid ExceptionGroup on connection close (MCP #915). Test 2 (create_mcp_server 'auth value is provided'): - Use userEvent.setup({ delay: null }) for instant keystrokes to avoid timeout from default typing delay on CI. - Increase per-test timeout to 15000ms for CI environments. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: stabilize proxy unit tests for parallel execution - test_response_polling_handler: add xdist_group to prevent heavy import OOM - test_db_schema_migration: use temp dir for worker isolation, sync schema.prisma index - test_custom_tokenizer_bug: use lighter tokenizer to prevent OOM in parallel Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: add auth overrides to more spend tracking and model info tests - Fix test_ui_view_spend_logs_pagination missing auth override (401) - Fix test_view_spend_tags missing auth override (401) - Fix test_view_spend_tags_no_database missing auth override (401) - Fix test_empty_model_list.py to use app.dependency_overrides instead of patch() for FastAPI dependency injection auth Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): use patch.object for aiohttp transport test to work in parallel execution The @patch decorator was not intercepting the static method call in parallel xdist workers. Using patch.object on the directly-imported class is more reliable. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(security): update minimatch from 10.2.1 to 10.2.4 in Dockerfile The Docker image was explicitly pinning minimatch@10.2.1 which has HIGH severity ReDoS vulnerabilities (GHSA-7r86-cg39-jmmj, GHSA-23c5-xmqv-rm74). Update to 10.2.4 which includes fixes for both CVEs. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(ui): prevent MCP and TeamInfo test timeouts on CI - Add userEvent.setup({ delay: null }) to all tests using userEvent in both files - Add timeout: 15000 to tests with significant user interaction (typing, multiple clicks) - Fixes: create_mcp_server Bearer Token test, TeamInfo cancel button test Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: stabilize parallel test execution and aiohttp transport test - test_aiohttp_handler: rewrite transport test to not rely on static method mock (consistently fails in parallel xdist workers) - test_proxy_cli: add xdist_group to prevent timeout during heavy imports - test_swagger_chat_completions: add xdist_group to prevent timeout Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(security): add serialize-javascript override to fix GHSA-5c6j-r48x-rmvq Add npm override for serialize-javascript>=7.0.3 in docs/my-website to fix HIGH severity RCE vulnerability via RegExp.flags. Also bump minimatch override to >=10.2.4. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Fix flaky tests: remove broken Vertex model, add retries for Anthropic - Remove vertex_ai/meta/llama-4-scout-17b-16e-instruct-maas from test_partner_models_httpx_streaming - consistently returns 400 BadRequest - Add @pytest.mark.flaky(retries=6, delay=10) to test_function_call_parsing for transient Anthropic API overload errors - Add @pytest.mark.flaky(retries=6, delay=10) to test_openai_stream_options_call for transient Anthropic InternalServerError Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(ci): add xdist_group(proxy_heavy) to prevent OOM in parallel proxy tests - Add pytestmark = pytest.mark.xdist_group('proxy_heavy') to test_proxy_utils.py - Change test_db_schema_migration.py from schema_migration to proxy_heavy group - Add @pytest.mark.xdist_group('proxy_heavy') to test_proxy_server.py::test_health Groups heavy proxy tests to run on same worker, avoiding worker OOM crashes. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Fix vertex AI qwen global endpoint test to mock vertexai module import The test_vertex_ai_qwen_global_endpoint_url test was failing because the VertexAIPartnerModels.completion() method tries to 'import vertexai' before any of the mocked code runs. In environments without google-cloud-aiplatform installed, this import fails with a VertexAIError(status_code=400). Fix by: - Adding patch.dict('sys.modules', {'vertexai': MagicMock()}) to mock the vertexai module import - Adding vertex_ai_location parameter to the acompletion call for completeness Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(ci): add xdist_group to health endpoint and watsonx tests for parallel stability - test_health_liveliness_endpoint: add xdist_group('proxy_health') to prevent timeout - test_watsonx_gpt_oss tests: add xdist_group('watsonx_heavy') to prevent mock interference Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): pre-populate WatsonX IAM token cache to prevent parallel test interference The watsonx prompt transformation test was failing in parallel execution because litellm.module_level_client.post mock was being interfered with by other tests. Pre-populating the IAM token cache avoids the HTTP call entirely. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): add spend data polling with retries for e2e pass-through tests - test_vertex_with_spend.test.js: Replace 15s fixed wait with polling loop (up to 6 attempts, 10s apart) for spend data to appear in DB - Increase test timeout from 25s to 90s to accommodate polling - base_anthropic_messages_tool_search_test.py: Add flaky(retries=3) for streaming test that depends on live Anthropic API Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(ci): reduce parallel workers from 8 to 4 for proxy tests to prevent OOM - litellm_proxy_unit_testing_part2: -n 8 -> -n 4 - litellm_mapped_tests_proxy_part2: -n 8 -> -n 4, timeout 60 -> 120 - Worker crashes consistently caused by too many parallel proxy tests each loading the full FastAPI app and heavy dependency tree Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(db): add migration for SpendLogs composite index (startTime, request_id) The @@index([startTime, request_id]) was added to schema.prisma but had no corresponding migration. This caused test_aaaasschema_migration_check to fail because prisma migrate diff detected the missing index. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(db): add migration for MCP available_on_public_internet default change to true The schema.prisma changed the default for available_on_public_internet from false to true, but no migration was created. This caused the schema migration test to detect drift. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): increase server wait time and add retry to flaky external API tests - test_basic_python_version.py: increase server startup wait from 60s to 90s for slower CI environments (fixes installing_litellm_on_python_3_13) - test_a2a_agent.py: add flaky(retries=3, delay=5) for non-streaming test that depends on live A2A agent endpoint Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): add flaky retries to all intermittent external API tests for 0-fail CI Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(test): add auth overrides to file endpoint tests that return 500 The test_target_storage tests were getting 500 because the FastAPI auth dependency wasn't overridden. Added app.dependency_overrides for proper auth bypass in test environment. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> |
||
|
|
8ce358e303
|
[Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329)
* fix: enforce RBAC on agent endpoints — block non-admin create/update/delete
- Add /v1/agents/{agent_id} to agent_routes so internal users can
access GET-by-ID (previously returned 403 due to missing route pattern)
- Add _check_agent_management_permission() guard to POST, PUT, PATCH,
DELETE agent endpoints — only PROXY_ADMIN may mutate agents
- Add user_api_key_dict param to delete_agent so the role check works
- Add comprehensive unit tests for RBAC enforcement across all roles
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: mock prisma_client in internal user get-agent-by-id test
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* feat(ui): hide agent create/delete controls for non-admin users
Match MCP servers pattern: wrap '+ Add New Agent' button in
isAdmin conditional so internal users see a read-only agents view.
Delete buttons in card and table were already gated.
Update empty-state copy for non-admin users.
Add 7 Vitest tests covering role-based visibility.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
|
||
|
|
c3c79b9d80 |
[Fix] Virtual Keys pagination displays stale totals when filtering
When applying a key_alias filter on the Virtual Keys page, the pagination display (total count, page count) showed stale unfiltered values. The bug was that the table component tracked total_count only from the unfiltered useKeys hook, not from the filtered API response returned by useFilterLogic. Fixes: When selecting a key alias that matches 1 key, the UI previously showed "Showing 1-50 of 509 results" and "Page 1 of 11" instead of the correct "Showing 1-1 of 1 results" and "Page 1 of 1". Changes: - Added filteredTotalCount state to useFilterLogic to track the total_count from filtered API responses - Updated VirtualKeysTable to use filteredTotalCount (when set) instead of always using the unfiltered total - Added comprehensive tests to prevent regression of pagination display logic Type: 🐛 Bug Fix, ✅ Test Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> |
||
|
|
3545584a00
|
Development environment setup (#22160)
* feat: add pretty view for realtime API logs in dashboard - Create RealtimePrettyView component that renders structured session config, conversation turns with transcripts, and token breakdowns - Update PrettyMessagesView to detect realtime responses (via isRealtimeResponse helper) and delegate to the new component - Session card shows model, voice, modalities, temperature, instructions in a collapsible panel - Conversation turns show status, per-turn token usage, and audio/text transcripts with appropriate icons - Add 24 tests for RealtimePrettyView and 3 tests for PrettyMessagesView - All 75 LogDetailsDrawer tests pass Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * chore: remove dev_config.yaml from tracked files Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * feat: show turn count in realtime pretty view session header and output header - Add purple 'N turns' tag to Session card header for at-a-glance turn count - Add 'Turns: N' to the Output section header next to tokens/cost - Extend SectionHeader to accept optional turnCount prop - Add 3 new tests for turn count display (singular, plural, output header) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: address Greptile review feedback - Remove response.audio.done and conversation.item.created from isRealtimeResponse() detection since the view doesn't render them; prevents misleading fallback for responses with only those events - Remove dead code: index >= 0 is always true in .map() callback Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> |
||
|
|
12c4876891
|
Agents - assign tools (#22064)
* feat(proxy): add max_iterations limiter for agent session loops (#22058) Adds a new proxy hook that enforces a per-session cap on the number of LLM calls an agentic loop can make. Callers send a session_id with each request, and the hook counts calls per session, returning 429 when the configured max_iterations limit is exceeded. - Uses Redis Lua script for atomic increment (multi-instance safe) - Falls back to in-memory cache when Redis unavailable - Follows parallel_request_limiter_v3 pattern - Configurable via key metadata: {"max_iterations": 25} - Session counters auto-expire via TTL (default 1hr) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: add new code execution dataset * feat(agent_endpoints/): allow giving agents keys * fix: ui fixes * feat: allow assigning mcp servers to agents * fix: eliminate duplicate DB queries in MCP agent auth and N+1 in agent listing (#22110) - Extract _get_agent_object_permission helper so _get_allowed_mcp_servers_for_agent and _get_agent_tool_permissions_for_server share a single DB fetch instead of each independently querying the same agent row (was 1+N queries per MCP request) - Use include={"object_permission": True} on find_many in get_all_agents_from_db to eagerly load permissions in one query instead of N+1 - Use include={"object_permission": True} on create/update/find_unique in all agent CRUD operations, removing attach_object_permission_to_dict follow-up calls Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
6ae7e84a0b |
[Test] UI - Pricing Calculator: Add comprehensive unit tests
Added unit tests for all pricing calculator components with 64 passing tests across 5 test files: - multi_export_utils.test.ts (16 tests for PDF/CSV export functions) - use_multi_cost_estimate.test.ts (15 tests for cost estimation hook) - multi_export_dropdown.test.tsx (8 tests for export dropdown component) - multi_cost_results.test.tsx (15 tests for results display and UI states) - index.test.tsx (10 tests for main calculator component) Also fixed missing page description for tool-policies page in page_metadata.ts. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> |
||
|
|
360643e213
|
[Feat] UI - Allow using AI to understand Usage patterns (#22042)
* Add Ask AI chat component to Usage page - Create UsageAIChatModal component with streaming chat interface - Integrate with existing model hub for model selection - Pass usage data context (spend, models, providers, keys) to AI - Add Ask AI button next to Export Data button in global view - Add tests for the new component and integration Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Convert Ask AI from modal to right-side sliding panel - Replace UsageAIChatModal with UsageAIChatPanel - Panel slides in from right side, usage page stays visible - Full-height panel with header, model selector, chat area, and input - Smooth CSS transition for open/close animation - Update tests for new panel component (34 tests passing) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Remove build output directory from tracking Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add backend AI usage chat endpoint with tool calling Backend: - New /usage/ai/chat SSE streaming endpoint - AI agent has get_usage_data tool that queries /user/daily/activity/aggregated - Follows same architecture as policy AI suggest (litellm.acompletion + tools) - Non-admin users are restricted to their own data - 12 backend unit tests Frontend: - Panel now calls /usage/ai/chat backend endpoint via SSE - Removed direct OpenAI client calls from frontend - Added usageAiChatStream networking function following enrichPolicyTemplateStream pattern Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Make model selection optional, default to gpt-4o-mini on backend Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Add team/tag tools, status indicators, and improved AI agent - AI agent now has 3 tools: get_usage_data, get_team_usage_data, get_tag_usage_data - Stream status events (Thinking... Fetching... Analyzing...) to UI - Frontend shows spinner + status text during tool execution - Better system prompt guiding tool selection - Entity summariser for team/tag data with ranked breakdowns - 13 backend tests, 34 frontend tests passing Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Fix: inject today's date into system prompt so AI resolves relative dates correctly Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Show tool calls as distinct steps + render markdown in responses - Backend emits tool_call events with tool_name, label, args, and status - Frontend shows each tool call as a step with ✓/spinner/✗ indicator - Tool call steps show icon, label, date range, and filters - AI responses rendered with ReactMarkdown (bold, lists, tables, code) - Cursor-like UX: Thinking → tool calls → Analyzing → streamed answer Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Refactor backend for code quality: proper types, constants, all functions ≤50 LOC - TypedDict for SSE events (SSEStatusEvent, SSEToolCallEvent, etc.) and ToolHandler - Constants for table names, entity fields, temperature, page sizes, top-N limits - Shared _query_activity() eliminates duplicated fetch logic - _accumulate_breakdown() + _ranked_lines() replace inline aggregation loops - Extracted _process_tool_call() and _stream_final_response() from main stream fn - Black + Ruff clean, all 15 functions verified ≤50 LOC - Replaced Tremor Button with Antd Button in panel (Tremor deprecated per AGENTS.md) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Address greptile review: security fixes and input validation - Restrict team/tag tools to admin-only users (non-admins only get get_usage_data) - Constrain ChatMessage.role to Literal['user', 'assistant'] to prevent system prompt injection - Add test for base tools restriction (non-admin gets 1 tool, admin gets 3) - Issues 3 (unused imports) and 4 (inline datetime) were already fixed in prior commit Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Address greptile round 2: sanitize errors, defense-in-depth allowlist, revert tsconfig - Sanitize error messages: generic 'An internal error occurred' sent to client, full exception logged server-side via verbose_proxy_logger - Defense-in-depth: _process_tool_call validates fn_name against role-based allowlist before dispatch (even though LLM only receives allowed tools) - Revert tsconfig.json jsx back to 'preserve' (Next.js recommended default) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Role-scoped system prompt + additional test coverage - System prompt is now role-aware: admin sees all 3 tool descriptions, non-admin only sees get_usage_data (consistent with tool filtering) - Added tests: non-admin prompt excludes team/tag tools, date injection - 15 backend tests, 34 frontend tests passing Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Fix LLM arg validation + cap conversation size at 20 messages - _resolve_fetch_kwargs uses .get() with ValueError for missing dates (handles malformed LLM tool arguments gracefully) - MAX_CHAT_MESSAGES = 20 constant; backend truncates to last 20 - Frontend also sends only last 20 messages per request - Prevents excessive token usage and context-length errors Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> |
||
|
|
a385126a7c |
Litellm dev compliance UI (#21457)
* feat(ui/): initial commit adding a compliance testing playground allow proxy admins to test policies and guardrails against datasets * feat(ui/): make score more friendly * feat(policy_endpoints.py): new helper function for testing policies * feat(policy_endpoints.py): expose new endpoint for testing policies and guardrails enables compliance playground to work as expected * feat(complianceui.tsx): show returned text |
||
|
|
1a8525d02a
|
Add GDPR Art. 32 EU PII Protection Policy Template (#21340)
* Add 6 new EU PII patterns for GDPR compliance
- fr_nir: French Social Security Number (NIR/INSEE) with validation
- eu_iban_enhanced: Enhanced IBAN detection with specific format
- fr_phone: French phone numbers (+33, 0033, 0 formats)
- eu_vat: EU VAT identification numbers (all 27 member states)
- eu_passport_generic: Generic EU passport format
- fr_postal_code: French postal codes with contextual keywords
* Add GDPR Art. 32 EU PII Protection policy template
- Comprehensive GDPR Article 32 compliance policy
- 4 guardrail groups: National IDs, Financial, Contact Info, Business IDs
- Masks French NIR/INSEE, EU IBANs, French phones, EU VAT numbers
- Includes EU passport numbers and email addresses
- Medium complexity template with indigo icon
* Add comprehensive tests for EU PII patterns
- Test French NIR validation (sex digit, month range)
- Test enhanced IBAN detection (French, German)
- Test French phone number formats
- Test EU VAT numbers
- Test generic EU passport format
- Test French postal code pattern
* Add EU pattern loading and category validation tests
- Verify all 6 EU PII patterns are loaded correctly
- Verify patterns are categorized as 'EU PII Patterns'
- Ensure pattern loading consistency
* Add end-to-end tests for GDPR policy template
- 4 tests for PII that should be masked (NIR, IBAN, phone, VAT)
- 4 tests for text that should pass through (invalid patterns, no PII)
- 1 bonus test for multiple PII types in same message
- All tests verify correct masking behavior
* Add region field to policy templates
- Added region field to all 6 templates (EU, AU, Global)
- Updated both main and backup JSON files
- Enables region-based filtering in UI
* Add region filter to policy templates UI
- Added Radio.Group filter for regions (All, AU, EU, Global)
- Efficient filtering with useMemo hooks
- Clean button-based UI matching existing design
- Defaults missing regions to Global
* Apply suggestion from @greptile-apps[bot]
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Address Greptile review: add contextual guards and negative tests
- Added keyword_pattern to eu_vat (VAT, tax number, fiscal code, etc.)
- Added keyword_pattern to eu_passport_generic (passport, travel document, etc.)
- Added 3 negative unit tests for false positive prevention
- Added 2 E2E tests verifying no masking without keyword context
- All patterns now require contextual keywords to prevent false positives
* address greptile review feedback (greploop iteration 1)
- Remove unused HTTPException import from test file
- Add keyword_pattern to eu_vat for contextual VAT matching
- Add allow_word_numbers: false to eu_passport_generic
- Add negative test cases for EU VAT false positives
- All 5 Greptile comments addressed
* Address Greptile feedback: fix patterns and sync backup
- Fix fr_phone pattern: use negative lookbehind (?<!\d) to prevent false matches in longer digit strings
- Add keyword_pattern to eu_passport_generic to reduce false positives on version strings/SKUs
- Sync policy_templates_backup.json with main file (add GDPR template)
- Add keyword_pattern to eu_vat (was auto-added by formatter)
All pattern tests passing
* address greptile review feedback (greploop iteration 2)
- Update test to document that eu_vat raw pattern is intentionally broad
- Test verifies pattern DOES match common words (by design)
- Documents that keyword_pattern guard prevents false positives in production
- Addresses Greptile's false positive risk concern
* address greptile review feedback (greploop iteration 3)
- Fix test_eu_vat_masked: change gap from 2 words to 1 word (VAT number: FR...)
- This ensures keyword matching works within MAX_KEYWORD_VALUE_GAP_WORDS=1 limit
- fr_phone pattern already works correctly (verified with tests)
- test_pattern_requires_keyword_context already updated in iteration 2
Addresses final issues from Greptile 2/5 review
* fix: remove country-specific passport patterns from GDPR template
- Remove passport_france, passport_germany, passport_netherlands from template
- These patterns lack keyword guards and cause false positives
- Only eu_passport_generic remains (has keyword_pattern guard)
- Sync policy_templates_backup.json with main file
- Update test setup to match template
All 11 E2E tests now passing ✅
* Update tests/test_litellm/proxy/guardrails/guardrail_hooks/content_filter/test_gdpr_policy_e2e.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/content_filter.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
||
|
|
10d891a365
|
Guardrails - add logging to all unified_guardrails + link to custom code guardrail templates (#20900)
* feat(guardrail_hooks/): add guardrail logging to all unified guardrails ensures unified guardrails use the 'log_guardrail_information' decorator for logging * fix(custom_guardrail.py): don't log inputs on guardrail response - just emit state * refactor: don't double log bedrock guardrail information * feat: add in-product nudges for contributing + trying community custom code guardrails allows users to contribute / share custom code guardrails |
||
|
|
54828e3783 | add knip as a dev dependency, remove some unused files | ||
|
|
51af66fdb2 | ui new buil | ||
|
|
b8876838a6 | revert react 18 | ||
|
|
dc5c8c8918 | react 19 | ||
|
|
c68baa3943 | upgrade react version | ||
|
|
b62f46ec5b | Update next to 16.1.6 | ||
|
|
3cb47dae44 | updating lodash for dashboard | ||
|
|
685437c9cc | Adding help scripts for neon | ||
|
|
936aa6821f
|
[Fix] CI/CD - litellm_security_tests (#18567) | ||
|
|
a1849a152c | Playwright setup in UI directory | ||
|
|
e3f1ce0138 | bumping docusaurus/theme-mermaid to 3.9.0 | ||
|
|
6f6a8f782e | Fix UI settings | ||
|
|
83291d394e
|
build(deps): bump mdast-util-to-hast in /ui/litellm-dashboard (#17444)
Bumps [mdast-util-to-hast](https://github.com/syntax-tree/mdast-util-to-hast) from 13.2.0 to 13.2.1. - [Release notes](https://github.com/syntax-tree/mdast-util-to-hast/releases) - [Commits](https://github.com/syntax-tree/mdast-util-to-hast/compare/13.2.0...13.2.1) --- updated-dependencies: - dependency-name: mdast-util-to-hast dependency-version: 13.2.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
5fb7530d8c
|
build(deps): bump jws from 3.2.2 to 3.2.3 in /ui/litellm-dashboard (#17494)
Bumps [jws](https://github.com/brianloveswords/node-jws) from 3.2.2 to 3.2.3. - [Release notes](https://github.com/brianloveswords/node-jws/releases) - [Changelog](https://github.com/auth0/node-jws/blob/master/CHANGELOG.md) - [Commits](https://github.com/brianloveswords/node-jws/compare/v3.2.2...v3.2.3) --- updated-dependencies: - dependency-name: jws dependency-version: 3.2.3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |
||
|
|
b487e67dec | sec fix | ||
|
|
3ba3faefb8 | fix sec scan | ||
|
|
1adaf043db | fix TYPE_CHECKING + security | ||
|
|
4be0c6a226
|
JSONpanel (#16687) | ||
|
|
acad73018d | fix pkg lock | ||
|
|
e037d9315d
|
Add Vertex and Gemini Videos API with Cost Tracking + UI support (#16323)
* Use video id for videos api * remove mock code * Potential fix for code scanning alert no. 3630: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * remove print statements * Update video prefix for 'video_' * Add veo with openai videos unified specs * Add videos testing to UI * remove mock code * Remove not need ui changes: * Fix mypy errors related to gemini * fix test_transform_video_create_request * Add vertex ai veo config * Add vertex ai veo config * Add cost tracking for gemini and add optional param passing * fix bugs related to vertex ai veo * Add Gemini Veo Video Generation in Openai Videos Unified Spec (#16229) * Add veo with openai videos unified specs * Add videos testing to UI * remove mock code * Remove not need ui changes: * Fix mypy errors related to gemini * fix test_transform_video_create_request * Add contant video duration for gemini and vertex * Fix litellm_mapped_tests tests * fix azure videos issue * Added doc for videos vertex ai * fix seconds param error * fix lint errors * test_transform_video_create_response_cost_tracking_no_duration --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> |