* fix(bedrock-mantle): use /anthropic/v1/messages path for Mantle endpoint (#27943)
* docs: add one-line docstring to _disable_debugging (#27894)
Squash-merged by litellm-agent from oss-agent-shin's PR.
* Add jp. Bedrock cross-region inference profile for claude-sonnet-4-6 (#27831)
Squash-merged by litellm-agent from Cyberfilo's PR.
* Sanitize empty text content blocks on /v1/messages (#27832)
Squash-merged by litellm-agent from Cyberfilo's PR.
* fix(bedrock-mantle): use /anthropic/v1/messages path for Mantle endpoint
The bedrock-mantle gateway (Claude Mythos Preview) serves the Anthropic
Messages API at /anthropic/v1/messages; /v1/messages returns 404 Not
Found. Both AmazonMantleConfig (chat/completions caller route) and
AmazonMantleMessagesConfig (anthropic-messages caller route) hardcoded
the wrong path, so every Mantle request 404'd before reaching the model.
Per the Anthropic docs: "[Claude in Amazon Bedrock] uses the Messages
API at /anthropic/v1/messages with SSE streaming."
https://platform.claude.com/docs/en/api/claude-on-amazon-bedrock
Confirmed independently against the live endpoint:
/v1/chat/completions -> 200 OK
/v1/messages -> 404 Not Found (what litellm used)
/anthropic/v1/messages -> 200 OK (Claude only)
Adds a regression test asserting both Mantle configs build the
/anthropic/v1/messages path, and updates the existing assertions that
encoded the wrong path.
---------
Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
* fix: sanitize empty text blocks in sync anthropic_messages_handler path
Co-authored-by: Yassin Kortam <yassin@berri.ai>
---------
Co-authored-by: João Costa <13508071+jpv-costa@users.noreply.github.com>
Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(cost): align vertex_ai/gemini-embedding-2-preview with Vertex multimodal pricing
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(cost): align vertex_ai/gemini-embedding-2 GA source URL with preview
Per Greptile review on #27848: GA entry referenced ai.google.dev while
the preview entry was updated to the canonical Vertex AI pricing page.
Both share identical pricing values; sync the source URL for consistency.
https://claude.ai/code/session_01W8jRwstnmduadGw8Z8egxe
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude <noreply@anthropic.com>
* feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_context_window.json
xAI's docs page now lists grok-4.3 as the recommended chat / coding model:
"We strongly recommend all API callers use grok-4.3. It is the most
intelligent and fastest model we've built." (https://docs.x.ai/docs/models)
Pricing/specs sourced from xAI's published model metadata:
- input: $1.25 / 1M tokens (<=200k), $2.50 / 1M tokens (>200k)
- output: $2.50 / 1M tokens (<=200k), $5.00 / 1M tokens (>200k)
- cached: $0.20 / 1M tokens (<=200k), $0.40 / 1M tokens (>200k)
- context: 1,000,000 tokens
- capabilities: vision, reasoning, function calling, structured outputs,
prompt caching, web search
Adds two entries: `xai/grok-4.3` (canonical) and `xai/grok-4.3-latest` (alias),
mirroring the pattern used for the rest of the xAI/Grok-4 family.
* test(xai): add model_info test for grok-4.3 + sync backup cost map
- Mirror xai/grok-4.3 and xai/grok-4.3-latest entries into
litellm/model_prices_and_context_window_backup.json so the bundled
model cost map matches the canonical model_prices_and_context_window.json.
- Add tests/test_litellm/test_xai_grok_4_3_model_metadata.py covering
pricing tiers, capability flags, context window, provider routing,
and parity between the main and backup cost maps.
- Point 'source' at the live xAI models page (the per-model URL
https://docs.x.ai/docs/models/grok-4.3 currently 404s).
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
---------
Co-authored-by: shin-watcher <shin-watcher@berri.ai>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
Three of greptile's open comments on #27074 (P2 converse:512, P1
databricks:361, and the underlying capability-flag policy rule) flagged
the same pattern: _is_claude_4_6_model(...) or _is_claude_4_7_model(...)
used inline as a runtime 'is this an adaptive-thinking model?' check.
That requires a code release each time a new adaptive Claude lands.
Consolidate the inline gating to AnthropicModelInfo._is_adaptive_thinking_model,
and switch the helper itself to read a new supports_adaptive_thinking
flag from `model_prices_and_context_window.json` via `_supports_factory`,
falling back to the family pattern only when the model-map entry doesn't
carry the flag (preserves OpenRouter / Vercel / Bedrock-prefixed variants
that route through the same code path with non-canonical ids).
Adds `supports_adaptive_thinking: true` to the four 4.6/4.7 anthropic
entries (opus-4-6 + dated, opus-4-7 + dated, sonnet-4-6). Bedrock-prefixed
and Vertex-prefixed entries don't need the flag because both fall back
through the family pattern (the helper short-circuits early on True from
either path) and the bedrock/vertex Claude IDs all match the existing
opus-4-{6,7} / sonnet-4-{6,7} pattern.
Affected call sites:
- `bedrock/chat/converse_transformation.py:_handle_reasoning_effort_parameter`
- `anthropic/chat/transformation.py:_map_reasoning_effort`
- `anthropic/chat/transformation.py:map_openai_params` (output_config branch)
- `databricks/chat/transformation.py:map_openai_params` (output_config branch)
The remaining `_is_claude_4_6_model` / `_is_claude_4_7_model` references
in `AnthropicConfig._validate_effort_for_model` and
`AnthropicConfig.get_supported_openai_params` are intentionally retained:
they're per-model gating fallbacks for variants whose model-map entries
don't yet carry the `supports_max_reasoning_effort` /
`supports_reasoning` flag. Those are documented in-place.
Tests: 537 anthropic/bedrock/databricks/vertex/messages tests pass.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
- claude-sonnet-4-6 + reasoning_effort=max no longer 400s. Renamed
_is_opus_4_6_model to _is_claude_4_6_model at three sites and added
supports_max_reasoning_effort: true to 12 model entries in the JSON
cost map (10 sonnet 4.6 ids + OpenRouter opus 4.6/4.7).
- _map_reasoning_effort now raises BadRequestError(400) directly with
llm_provider, instead of letting Databricks (and similar callers)
surface its raw ValueError as a 500.
- output_config.effort on Opus 4.5 over Bedrock no longer 400s for
missing effort-2025-11-24 beta. Flipped JSON to "effort-2025-11-24"
for bedrock + bedrock_converse and added an auto-attach branch in
_process_tools_and_beta for non-adaptive Anthropic + output_config
on Converse.
- reasoning_effort=xhigh / =max on legacy budget-mode models
(Haiku 4.5, Sonnet 4.5, Opus 4.5) now map to thinking.budget_tokens
8192 / 16384 instead of returning 400. Added two constants in
litellm/constants.py.
Tests updated for all four flips. Validated end-to-end via 306-cell
live proxy matrix (6 model families x 3 routes x 17 effort cases),
all pass.
Replace hardcoded _EFFORT_SUPPORTING_MODEL_PATTERNS with a JSON-backed
check that uses supports_*_reasoning_effort flags from the model map.
Add supports_minimal_reasoning_effort: true to opus-4-5 and mythos-preview
entries (which previously only carried supports_reasoning) so the JSON
remains the single source of truth for effort capability.
Follow-up bugs surfaced by the QA sweep on PR #27039
(https://github.com/BerriAI/litellm/pull/27039#issuecomment-4363363610).
1. Stop stripping output_config.effort on Bedrock + Vertex adaptive routes.
- Vertex AI Claude 4.6/4.7 accepts output_config.effort on rawPredict
(verified end-to-end against us-east5 / global). The strip helper now
no-ops for effort.
- Bedrock Converse routes output_config into additionalModelRequestFields
for anthropic base models so the requested adaptive tier (low/medium/
high/xhigh/max) actually reaches the wire instead of all collapsing to
identical thinking.
- Bedrock Invoke chat transformation (AmazonAnthropicClaudeConfig) stops
popping output_config from the post-AnthropicConfig request body.
- Bedrock Invoke /v1/messages allowlist (BedrockInvokeAnthropicMessagesRequest)
now lists output_config so the runtime allowlist filter forwards it.
2. Validate effort across Bedrock Converse so 'disabled' / 'invalid' / '' /
unsupported tiers (xhigh/max on Sonnet 4.6 or budget-mode 4.5 models)
surface as a clean 400 BadRequestError instead of 500.
3. ValueError -> BadRequestError throughout (AnthropicConfig.map_openai_params,
_apply_output_config, AmazonConverseConfig._handle_reasoning_effort_parameter).
Empty-string effort is now rejected (was silently passing the
'if effort and ...' short-circuit).
4. Floor reasoning_effort='minimal' at the Anthropic provider minimum
(1024 budget_tokens) via new ANTHROPIC_MIN_THINKING_BUDGET_TOKENS so it's
a usable tier on direct Anthropic / Azure AI Anthropic / Vertex AI Anthropic /
Bedrock Invoke (all of which 400 below 1024).
5. model_prices: dedupe duplicate supports_max_reasoning_effort key on
claude-opus-4-7 / claude-opus-4-7-20260416.
Adds regression tests across all five affected paths; existing tests asserting
the silent-strip behavior were updated to reflect the new pass-through and
clean 400 surfaces.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
- Route publisher/model ids (e.g. xai/grok) to .../endpoints/openapi; keep model in JSON body
- Add model_prices keys for vertex_ai/openai/xai/grok-*
- Document xAI Grok on vertex_partner (aligned with GPT-OSS)
- Add tests for create_vertex_url and body-model heuristic
Made-with: Cursor
- Streaming example referenced Llama-3.1 instead of Llama-3.3
- Add supports_vision: true for gemma-3-12b-it in both JSON files,
matching other providers (bedrock, novita)
These are reasoning/thinking models but were missing the flag, causing
litellm.supports_reasoning() to return False and reasoning-token handling
to not activate.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AWS Bedrock pricing publishes a separate 1-hour prompt-cache write rate for
Claude 4.5 / 4.6 / 4.7 (1.6x the 5-minute rate). Without
`cache_creation_input_token_cost_above_1hr`, cost tracking for 1-hour-TTL
prompt caching on Bedrock falls back to the 5-minute rate and undercounts
spend by ~60%.
Adds the field to the spot-checked Global and US-region entries:
- anthropic.claude-opus-4-7 (Global $10.00 / MTok)
- anthropic.claude-opus-4-6-v1 (Global $10.00 / MTok)
- anthropic.claude-opus-4-5-... (Global $10.00 / MTok)
- anthropic.claude-sonnet-4-6 (Global $6.00 / MTok)
- anthropic.claude-sonnet-4-5-... (Global $6.00 / MTok regular,
$12.00 / MTok long-context >200K)
- anthropic.claude-haiku-4-5-... (Global $2.00 / MTok)
- global.anthropic.* mirrors of the above
- us.anthropic.* mirrors at the US +10% premium
Also updates the long-context (>200K) variants of Sonnet 4.5 with
`cache_creation_input_token_cost_above_1hr_above_200k_tokens`.
The mirrored entries in `litellm/model_prices_and_context_window_backup.json`
are updated in lockstep.
EU / AU / APAC / JP / us-gov regional variants are out of scope for this
change pending separate verification against AWS Bedrock pricing for those
regions.
Adds tests/test_litellm/test_bedrock_anthropic_1hr_cache_pricing.py to lock
in the expected values and the 1.6x ratio invariant.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* feat(azure): add azure/gpt-5.5 + azure/gpt-5.5-pro entries (+ dated variants)
Azure variants of OpenAI's GPT-5.5 family. Microsoft has not yet
shipped GPT-5.5 on Azure OpenAI (latest GA on the Foundry models page
is GPT-5.4 as of 2026-04-24), but adding the entries day-0 mirrors the
established precedent for azure/gpt-5.4* (which were in the cost map
before the Azure rollout) so cost tracking and capability flags work
the moment customers deploy.
Schema follows the existing azure/gpt-5.4* shape:
- Same base/long-context pricing as openai/gpt-5.5*: $5/$30 chat,
$60/$360 pro per 1M, with priority tier 2x base
- Azure variants drop the flex/batches keys (Azure has no flex tier)
but keep priority pricing, matching gpt-5.4* precedent
- mode=chat for the thinking model, mode=responses for pro
reasoning_effort capability flags mirror the OpenAI variants exactly
since Azure proxies the same API contract: minimal rejection on both
chat and pro, low/none rejection on pro. Once #26456 (which sets
supports_low_reasoning_effort + minimal=false on openai/gpt-5.5*)
lands, OpenAI and Azure flag profiles align.
Tests pin entry presence + pricing for all four Azure variants and
verify the live-API-derived reasoning_effort flags.
* test: register supports_low_reasoning_effort in cost-map JSON schema
azure/gpt-5.5-pro and azure/gpt-5.5-pro-2026-04-23 added in this branch
carry supports_low_reasoning_effort=false. The strict
'additionalProperties: false' schema in
test_aaamodel_prices_and_context_window_json_is_valid rejected the new
key. Register it alongside the other supports_*_reasoning_effort
entries.
Note: the runtime side of this flag (code that reads it) lands in
#26456. Until that PR merges the flag is inert for both Azure and
OpenAI pro entries, but having the schema accept it lets cost-map
tests pass on either merge order.
* feat(proxy): add /v1/memory CRUD endpoints with user/team scoping
New LiteLLM_MemoryTable stores user/team-scoped key/value entries with
optional JSON metadata. Value is a String (LLM-readable text) and metadata
is an optional Json? envelope, matching the Letta + mem0 hybrid model so
future structured fields can be added without a schema migration.
Endpoints:
POST /v1/memory - create
GET /v1/memory - list (caller-scoped; admins see all)
GET /v1/memory/{key} - fetch one
PUT /v1/memory/{key} - upsert
DELETE /v1/memory/{key} - delete
Non-admin callers cannot set a user_id/team_id other than their own.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(proxy/memory): omit metadata field when None on create
Prisma's Python client rejects `metadata=None` on a `Json?` field with
"A value is required but not set" — the field must be omitted from the
`data` dict entirely to store SQL NULL. Build the create payload
conditionally in both `create_memory` and the PUT-create branch of
`upsert_memory`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(ui): add Memory page to view/manage /v1/memory entries
Adds a new "Memory" sidebar item under Tools so users can see what their
agents have stored. Lists all memories visible to the caller (scoped by
the backend), with a key-search filter, preview column, scope tags, and
view/edit/delete actions. Create modal accepts optional JSON metadata.
- networking.tsx: fetchMemoryList / createMemory / updateMemory / deleteMemory
wired to the /v1/memory CRUD endpoints.
- MemoryView + MemoryEditModal: new antd-based components (per CLAUDE.md:
use antd for new UI, not tremor).
- page.tsx + leftnav.tsx: wire the "memory" route + sidebar entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(memory): add key_prefix filter + promote Memory to AI GATEWAY nav
Backend:
- GET /v1/memory now accepts `key_prefix` for Redis-style namespace
scans (e.g. `?key_prefix=user:`). When both `key` and `key_prefix`
are passed, `key_prefix` wins.
- Prefix filter sits under the visibility filter in the Prisma where
clause, so it can never leak rows across user/team scopes.
- New tests: prefix match, and cross-scope isolation (another user's
`user:*` rows must not appear in the caller's results).
UI:
- Memory moved from a Tools submenu to a top-level AI GATEWAY item
(alongside Agents, MCP Servers, Skills) — it's an API primitive,
not a tool-management surface.
- Search box now drives prefix search, matching the Redis mental
model ("type the namespace, see everything under it").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): enforce unique key per scope by using NULLS NOT DISTINCT
The unique constraint `(key, user_id, team_id)` on LiteLLM_MemoryTable
silently allowed duplicates when user_id or team_id was NULL, because
Postgres treats every NULL as distinct by default (ANSI semantics). A
caller with no team_id could POST the same key three times and get
three rows.
Migration:
1. Dedupe existing rows, keeping the most recent per (key, user_id,
team_id), using `IS NOT DISTINCT FROM` so NULL == NULL.
2. Drop the old unique index.
3. Recreate it with `NULLS NOT DISTINCT` (Postgres 15+).
No code change: POST already returns 409 on unique-violation error
messages — it just wasn't firing before because the constraint didn't
catch the NULL-team case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): make key globally unique, 409 on any duplicate
Switches from the compound unique `(key, user_id, team_id)` to a simple
`key @unique`. The compound form silently allowed duplicates when
user_id or team_id was NULL (Postgres treats each NULL as distinct), so
callers could POST the same key repeatedly. Globally-unique key means
one row per key, period — any duplicate create → 409.
- schema.prisma (×3): `key String @unique`, drop `@@unique(...)`.
- initial add_memory_table migration: unique index on (key) only.
- Remove the now-unused follow-up NULLS NOT DISTINCT migration.
- Endpoint error message simplified ("already exists" — no "for this scope").
- Test fake's create() now enforces global key uniqueness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): full-width layout + user/teams-style columns
- Add `w-full` to the MemoryView outer div so the page fills the
flex-flex-1 container (was collapsing to intrinsic width).
- Replace the combined "Scope" column with separate User ID / Team ID
columns, matching the layout of the Users / Teams pages: ID, Name,
Preview, User ID, Team ID, Updated, Actions.
- IDs render with a truncated mono label + copy-to-clipboard button,
same pattern as view_users.
- Detail drawer now shows Memory ID / User ID / Team ID as separate
fields instead of stacked color tags.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): use clean MCP-style ID pill, drop copy icons
The ID / User ID / Team ID columns showed a mono text blob with a
copy-to-clipboard icon next to each value — too busy compared to the
MCP Servers page. Swap the renderer for MCP's pill style:
- Truncated mono ID inside a blue Tailwind pill
(`font-mono text-blue-600 bg-blue-50 ... rounded-md border`).
- No copy icon. Full ID surfaces via tooltip.
- ID column is a button that opens the detail drawer on click;
user/team ID pills are static (not clickable).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): address greptile review feedback
Addresses 5 greptile findings (3/5 → higher confidence target):
1. Identity-less orphan rows (P1): non-admin callers with no user_id AND
no team_id could create rows that the visibility filter would never
match again. Now rejected up front with 400 — caller must authenticate
with a scoped key or act as PROXY_ADMIN.
2. Upsert race returning 500 (P1): PUT's check-then-create isn't atomic;
a concurrent writer could slip a row in between the 404-check and the
create call. Now catch unique-violation on create, re-read, and fall
through to update — PUT stays idempotent. If the conflicting row
belongs to a different scope, surface a 409 instead of 500.
3. PUT-create scope inconsistency (P2): PUT's create branch always used
the caller's own user_id/team_id, so admins couldn't bootstrap rows
scoped elsewhere via PUT (only POST). Now PUT-create calls the shared
`_resolve_scope()` helper, matching POST semantics.
4. Stale schema comment (P2): schema said "Keyed by (key, user_id,
team_id)" but `key` is globally unique. Updated all three schema
copies to reflect the actual design.
5. UI silently truncated at 200 (P2): MemoryView fetched pageSize=200
with no load-more. Swapped to real server-side pagination driven by
`data.total`; page size is now 50 and the pager is a real AntD
control.
Also extracts a shared `_resolve_scope()` helper and `_is_unique_violation()`
from create_memory so POST and PUT don't drift on the scope/error logic.
Tests: +3 new (identity-less 400, PUT admin bootstrap, PUT race →
update), 18/18 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): typed Prisma error + explicit-null metadata on PUT
Two more greptile threads from the last review:
- Unique-violation detection was string-matching "Unique"/"UniqueViolation"
in the exception message, fragile across Prisma/driver versions. Now
check the typed error `code == "P2002"` first, with string fallback.
- PUT could not distinguish "metadata omitted" from "metadata: null" —
both parsed as `None`, so callers had no way to clear stored metadata.
Switch to Pydantic v2's `model_fields_set` to tell which fields the
caller actually sent; explicit null now clears the column.
New tests:
- explicit null clears metadata
- omitted metadata preserves existing value
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): send explicit null when user clears metadata
Addresses the remaining P1 from the last greptile review:
When the edit modal's metadata textarea was cleared and saved,
`metadataParsed` stayed `undefined`, `JSON.stringify` dropped the key
entirely, and the backend's `model_fields_set` guard therefore left
the stored metadata untouched — UI showed success but nothing changed.
Now: empty textarea on edit → send explicit `null` so the backend
sees `metadata` in `model_fields_set` and clears the column.
Empty textarea on create still maps to `undefined` (field omitted)
to avoid Prisma's `Json? = None` quirk on insert.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): preserve slashes in key path encoding
The backend route `/v1/memory/{key:path}` supports keys with slashes,
but `encodeURIComponent` encoded `/` as `%2F`. Some proxies (nginx
default, CloudFlare, AWS ALB) reject or re-decode `%2F` mid-flight,
so UI update/delete calls on slash-containing keys could fail or
silently misroute.
New helper `encodeMemoryKeyForPath` splits by `/`, URL-encodes each
segment, then rejoins with literal `/`. Every other unsafe char
(spaces, `?`, `#`, `%`) stays encoded per-segment; slashes stay as
path delimiters, matching what the `:path` converter expects.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ui/memory): drop misleading client-side column sorters
With server-side pagination, client sorters on `key` and `updated_at`
only reorder the current page while pretending to sort the full
dataset — users would see "sorted by name" but only the visible 50
rows would actually be sorted.
Remove the sorters. The backend already returns rows in
`updated_at DESC` order (sensible default for a memory view), and
users can narrow the result with the key-prefix filter.
Greptile also flagged missing `@@map` on the new model as a
"consistency" issue, but only 1 of 59 tables in this repo uses
`@@map` — the dominant pattern is to rely on Prisma's default
(model name == table name). Skipping that finding as a
false-positive on convention.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): compose visibility + key filters via explicit AND
Greptile P1 (filter-fragility): `where.update(vis)` was semantically
correct today, but dict-merging by key meant any future visibility
filter that grew a new top-level "OR" would silently clobber the
existing key filter.
Compose explicitly instead:
where = {"AND": [key_filter, vis]}
Applied to both `list_memory` and `_find_memory_for_caller`. When
either side is empty (admin has no visibility filter; list has no
key filter), skip the wrapper and use the non-empty side directly
to keep the generated SQL clean.
Test fake's `_matches` now understands top-level `AND` too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(ui/memory): wrap write helpers with react-query useMutation
Previously the Memory view read via `useQuery` but called the raw
create/update/delete fetch helpers directly in handlers, tracking
loading state with a local `submitting` flag and invalidating state
via `refetch()`. That mixes two concerns:
- it skips react-query's mutation state (isPending / isError / isSuccess)
- `refetch()` only retouches the currently-mounted query instance, not
other cached pages, so navigating back to an older page could show
stale rows
Switch the three write paths to `useMutation`:
- `createMutation`, `updateMutation`, `deleteMutation` — each owns
the mutation fn, success toast, and error toast.
- Success handlers invalidate the whole `["memoryList", ...]` prefix
via `queryClient.invalidateQueries`, so every cached page refetches
(pagination + filter-aware).
- Refresh button now invalidates instead of `refetch()`, keeping all
behavior consistent.
- handleSave/handleDelete become thin adapters that call `.mutateAsync`;
their errors are swallowed locally since the mutation's onError has
already surfaced the toast.
Also tightened the edit modal's key-field tooltip to reflect the
actual global-unique semantics (was "Unique per user/team scope").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): close cross-user write gap + sanitize 500 errors (Veria)
Addresses two Veria findings:
**High — cross-user memory tampering via team membership.** The
visibility filter uses an OR (`user_id == caller OR team_id == caller`)
so team members can SEE each other's team-scoped rows. That's
intentional for list/get. But because PUT/DELETE used the same filter
to find the target row, any team member could overwrite or delete a
teammate's *personal* row whenever both `user_id` and `team_id` were
stamped on it — broader visibility was being silently treated as
broader authority.
New `_assert_write_access(row, caller)` enforces ownership for
mutations. Non-admin rules:
- The row's `user_id` must match the caller (personal ownership), OR
- The row has no `user_id` and its `team_id` matches the caller's
team (a "pure team row" intended for shared writes).
Admins bypass the check. The same gate runs in PUT (both regular
and post-race-recovery branches) and DELETE.
**Medium — DB internals leaked through 500 detail.** Every `except`
block was raising `HTTPException(500, detail=str(e))`, which surfaces
Prisma error strings (table/column names, host:port, error class
names) to API callers. New `_internal_error()` helper logs the real
exception server-side and returns a generic, caller-safe `detail`.
Applied to create, list, upsert (general fallthrough), and delete.
Also tightened the race-recovery 409 message to drop the "in a
different scope" wording — the caller never needs to know whose
scope it lives in.
Tests (+5):
- teammate cannot overwrite personal row → 403
- teammate cannot delete personal row → 403
- teammate CAN modify pure team row (no user_id stamped) → 200
- admin bypasses write-auth → 200
- 500 response never echoes Prisma internals (table/host/class names)
25/25 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(memory): require team admin to modify pure team rows
Tightens the write-authorization rule for "pure team rows" (rows with
no user_id stamped, only team_id) to match the pattern used by
team-management endpoints (`_is_user_team_admin` + `_is_user_org_admin_for_team`):
- Plain team members can READ team rows via the OR visibility filter
(intentional, unchanged).
- Only PROXY_ADMIN, team admins of the row's team_id, or org admins
for the team's organization may MODIFY them. Plain members get 403.
`_assert_write_access` is now async and takes the prisma_client so it
can fetch the team and run the existing `_is_user_team_admin` /
`_is_user_org_admin_for_team` helpers from
`litellm.proxy.management_endpoints.common_utils`. The org-admin path
is best-effort: it calls `get_user_object`, which depends on the
proxy_server module being initialized, so any exception there is
treated as "not an org admin" rather than crashing the request.
Tests:
- team admin can modify pure team row → 200
- plain team member cannot modify pure team row → 403
- plain team member cannot delete pure team row → 403
Updates the test fake to add a tiny `litellm_teamtable.find_unique`
implementation and a `_make_team(team_id, admin_user_ids=[...])`
helper.
27/27 unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: mypy + UI page-metadata sync for memory page
Two CI failures:
1. mypy: `_find_memory_for_caller` had `key_filter` inferred as
`dict[str, str]` (literal type) and the conditional `{"AND": [key_filter, vis]}`
returned `dict[str, list[...]]`, so the join site failed
`dict-item` typing. Annotate both intermediates as `dict` so mypy
widens the value type.
2. UI test (`page_utils.test.ts > should have descriptions for all
pages`): every leftnav entry must have a description in
`page_metadata.ts`, and `memory` was missing. Added a one-line
description, matching the style of neighboring entries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* [Feat] Day-0 support for GPT-5.5 and GPT-5.5 Pro (#26449)
* feat(openai): day-0 support for GPT-5.5 and GPT-5.5 Pro
Add pricing + capability entries for the new GPT-5.5 family launched by
OpenAI on 2026-04-24:
- gpt-5.5 / gpt-5.5-2026-04-23 (chat): $5/$30/$0.50 per 1M
input/output/cached input
- gpt-5.5-pro / gpt-5.5-pro-2026-04-23 (responses-only): $60/$360/$6
per 1M input/output/cached input
Other fees (long-context >272k, flex, batches, priority, cache
discounts) follow the same ratios as GPT-5.4, with context window
retained at 1.05M input / 128K output.
No transformation / classifier code changes are required:
OpenAIGPT5Config.is_model_gpt_5_4_plus_model() already matches 5.5+ via
numeric version parsing, and model registration is driven from the
JSON. The existing responses-API bridge for tools + reasoning_effort
(litellm/main.py:970) already covers gpt-5.5-pro.
Tests:
- GPT5_MODELS regression list now covers gpt-5.5-pro and dated variants
- New test_generic_cost_per_token_gpt55_pro cost-calc test
- Updated test_generic_cost_per_token_gpt55 for long-context fields
* fix(openai): mirror reasoning_effort flags onto gpt-5.5 dated variants
gpt-5.5-2026-04-23 and gpt-5.5-pro-2026-04-23 were missing the
supports_none_reasoning_effort, supports_xhigh_reasoning_effort, and
supports_minimal_reasoning_effort flags that their non-dated
counterparts define. Reasoning-effort routing in OpenAIGPT5Config is
fully capability-driven from these JSON flags — since an absent flag
is treated as False for opt-in levels (xhigh), users pinning to a
dated snapshot would silently lose xhigh support and diverge from the
base alias on logprobs + flexible temperature handling.
Copy the flags onto both dated variants so every dated snapshot
inherits the base model's reasoning-effort capability profile.
Adds a parametrized regression test that asserts
supports_{none,minimal,xhigh}_reasoning_effort parity between each
dated variant and its non-dated counterpart, preventing future drift
when new snapshots are added.
* fix(schema): close LiteLLM_MemoryTable model brace dropped during merge
The rebase against `litellm_internal_staging` (which added
`LiteLLM_AdaptiveRouterState` / `LiteLLM_AdaptiveRouterSession`) left
the closing brace of `LiteLLM_MemoryTable` missing in all three
schema copies — the next model declaration ended up parsed as a field
of the memory table, surfacing as the CI prisma error:
error: This line is not a valid field or attribute definition.
--> schema.prisma:1250
|
1249 | // Per-(router, request_type, model) Beta posterior for the adaptive router.
1250 | model LiteLLM_AdaptiveRouterState {
Add the missing `}` (and the standard blank line) after the memory
table's `@@index([team_id])` in `schema.prisma`,
`litellm/proxy/schema.prisma`, and
`litellm-proxy-extras/litellm_proxy_extras/schema.prisma`.
`prisma generate --schema litellm/proxy/schema.prisma` now runs clean;
27/27 memory unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
gpt-5.5-pro only accepts reasoning_effort in {medium, high, xhigh}
(verified live against OpenAI's API on 2026-04-24). LiteLLM previously
had no way to express this constraint — the existing JSON schema
covered none/minimal/xhigh but not low. Result: drop_params=true users
saw an avoidable 400 from OpenAI.
Add supports_low_reasoning_effort following the existing opt-out
pattern (default-allow, explicit false to block). Mirror the minimal
branch in OpenAIGPT5Config.map_openai_params so 'low' goes through the
same _is_reasoning_effort_level_explicitly_disabled gate.
Set the flag to false on gpt-5.5-pro and gpt-5.5-pro-2026-04-23 in
both model_prices JSON files (kept in sync). Other models leave the
key absent so behavior is unchanged.
Tests cover: rejection on pro variants (no drop_params), drop on pro
with drop_params=True, passthrough on gpt-5.5 chat, passthrough on
unknown models, and the helper-level _is_reasoning_effort_level_explicitly_disabled
contract.
Verified against OpenAI's live Chat Completions API on 2026-04-24:
POST /v1/chat/completions
{"model": "gpt-5.5", "reasoning_effort": "minimal", ...}
-> 400 Unsupported value: 'reasoning_effort' does not support 'minimal'
with this model. Supported values are: 'none', 'low', 'medium',
'high', and 'xhigh'.
POST /v1/chat/completions
{"model": "gpt-5.5-pro", "reasoning_effort": "minimal", ...}
-> 400 Unsupported value: 'minimal' is not supported with the
'gpt-5.5-pro' model. Supported values are: 'medium', 'high', and
'xhigh'.
Set supports_minimal_reasoning_effort=false on all four entries
(gpt-5.5, gpt-5.5-2026-04-23, gpt-5.5-pro, gpt-5.5-pro-2026-04-23) so
OpenAIGPT5Config._is_reasoning_effort_level_explicitly_disabled fires
and LiteLLM either drops the param (drop_params=True) or raises a
local UnsupportedParamsError, instead of round-tripping to OpenAI for
a 400.
Adds a parametrized test_gpt55_reasoning_effort_flags_match_live_openai_api
test that pins supports_{none,minimal,xhigh}_reasoning_effort on each
entry to OpenAI's actual API contract.
Note: gpt-5.5-pro additionally rejects 'none' and 'low'. 'none' is
already handled (supports_none_reasoning_effort=false). 'low' is not
representable in the current JSON schema (no supports_low flag);
filing separately.
* feat(openai): day-0 support for GPT-5.5 and GPT-5.5 Pro
Add pricing + capability entries for the new GPT-5.5 family launched by
OpenAI on 2026-04-24:
- gpt-5.5 / gpt-5.5-2026-04-23 (chat): $5/$30/$0.50 per 1M
input/output/cached input
- gpt-5.5-pro / gpt-5.5-pro-2026-04-23 (responses-only): $60/$360/$6
per 1M input/output/cached input
Other fees (long-context >272k, flex, batches, priority, cache
discounts) follow the same ratios as GPT-5.4, with context window
retained at 1.05M input / 128K output.
No transformation / classifier code changes are required:
OpenAIGPT5Config.is_model_gpt_5_4_plus_model() already matches 5.5+ via
numeric version parsing, and model registration is driven from the
JSON. The existing responses-API bridge for tools + reasoning_effort
(litellm/main.py:970) already covers gpt-5.5-pro.
Tests:
- GPT5_MODELS regression list now covers gpt-5.5-pro and dated variants
- New test_generic_cost_per_token_gpt55_pro cost-calc test
- Updated test_generic_cost_per_token_gpt55 for long-context fields
* fix(openai): mirror reasoning_effort flags onto gpt-5.5 dated variants
gpt-5.5-2026-04-23 and gpt-5.5-pro-2026-04-23 were missing the
supports_none_reasoning_effort, supports_xhigh_reasoning_effort, and
supports_minimal_reasoning_effort flags that their non-dated
counterparts define. Reasoning-effort routing in OpenAIGPT5Config is
fully capability-driven from these JSON flags — since an absent flag
is treated as False for opt-in levels (xhigh), users pinning to a
dated snapshot would silently lose xhigh support and diverge from the
base alias on logprobs + flexible temperature handling.
Copy the flags onto both dated variants so every dated snapshot
inherits the base model's reasoning-effort capability profile.
Adds a parametrized regression test that asserts
supports_{none,minimal,xhigh}_reasoning_effort parity between each
dated variant and its non-dated counterpart, preventing future drift
when new snapshots are added.
* feat: add gpt-5.5 to model cost map
Add gpt-5.5 entry with pricing from OpenAI flagship page:
input $5/1M, cached input $0.50/1M, output $30/1M, 272K context.
* test: add gpt-5.5 coverage for model cost map and gpt-5 routing
- Add gpt-5.5 to GPT5_MODELS parametrized list so both OpenAIGPT5Config
and AzureOpenAIGPT5Config routing tests cover the new model.
- Add test_generic_cost_per_token_gpt55 verifying the new entry's
cost-map values ($5/$0.50/$30 per 1M) and that generic_cost_per_token
returns the expected prompt/completion costs.
* fix(model-info): include reasoning effort support fields in get_model_info
_get_model_info_helper constructs ModelInfoBase explicitly but never
reads supports_xhigh/minimal/none_reasoning_effort from the cost map
JSON. Add the three fields so get_model_info() returns them correctly.
Also add supports_minimal_reasoning_effort to the ModelInfo TypedDict
(xhigh and none were already declared, minimal was missing).
* fix(model-registry): add missing reasoning effort fields for claude 4.6/4.7
Claude Opus 4.7 supports max reasoning effort (above xhigh).
The field was present for Opus 4.6 but missing for all Opus 4.7
entries (base, dated, Bedrock, Vertex AI, Azure AI).
All Claude 4.6/4.7 models (Opus 4.6, Sonnet 4.6, Opus 4.7) support
minimal reasoning effort via adaptive thinking. Add the field to all
provider variants.
* fix(adapter): map output_config.effort to reasoning_effort (#25079)
Anthropic's adaptive thinking (thinking.type="adaptive") and
output_config.effort were silently dropped when translating to
OpenAI format, resulting in no reasoning_effort on the outgoing
request.
Adapter changes (format translation):
- adapters/transformation.py: add "adaptive" branch to
translate_anthropic_thinking_to_reasoning_effort(); pass through
output_config.effort as-is in _translate_thinking_to_openai();
add "output_config" to translatable_anthropic_params
- adapters/handler.py: extract output_config from extra_kwargs into
request_data so it reaches the translation layer
- responses_adapters/transformation.py: add "adaptive" branch and
output_config param to translate_thinking_to_reasoning()
Handler changes (model-aware normalization):
- utils.py: add normalize_reasoning_effort_value() that uses
get_model_info() to map "max" → "xhigh"/"high" and
"minimal" → "minimal"/"low" based on model capabilities
- adapters/handler.py: call normalization before responses routing
- responses_adapters/handler.py: call normalization after translation
Relates to BerriAI/litellm#25079
* test(reasoning-effort): add tests for effort capability fields and normalize logic
Test coverage for:
- get_model_info returning supports_minimal/max_reasoning_effort fields
- JSON registry entries for claude 4.6/4.7 across all providers
- normalize_reasoning_effort_value degradation chains and exception fallback
- Adapter translation of adaptive thinking + output_config.effort
* fix: forward custom_llm_provider to normalize_reasoning_effort_value in responses adapter
- Validate max effort like xhigh: Opus 4.6/4.7 id patterns or supports_max_reasoning_effort
- Set supports_max_reasoning_effort on claude-opus-4-7 entries in model cost JSON
- Update tests and add test_max_effort_accepted_for_opus_47
Made-with: Cursor
* add moonshot/kimi-k2.6 to model registry
* add moonshot/kimi-k2.6 to backup model registry
* add tests for moonshot/kimi-k2.6 model registry
* fix moonshot/kimi-k2.6 pricing and add reasoning support
* fix moonshot/kimi-k2.6 pricing and add reasoning support in backup
* update kimi-k2.6 tests: fix pricing, add tool_choice and reasoning checks
* fix: load kimi-k2.6 registry tests from local backup instead of remote cost map
* Add supported providers to prompt caching doc
* Move Z.ai / GLM to cache_control marker list
* Mark xAI models as supporting prompt caching
* Narrow xAI prompt caching flag to models with documented cache pricing
* Add prompt caching flag to grok-4, grok-4-0709, grok-4-latest
---------
Co-authored-by: Michael Riad Zaky <michaelr@Michaels-MacBook-Air.local>