- Introduce RoutingPrismaWrapper that transparently routes read operations (find_*, count, group_by, query_raw, query_first) to a reader endpoint while writes remain on the writer, enabling Aurora-style reader/writer endpoint splits
- Add IAMEndpoint dataclass and parse_iam_endpoint_from_url() to capture static connection fields from a reader URL so only the IAM token needs to rotate, avoiding the need for separate DATABASE_HOST_READ_REPLICA/etc. env vars
- Enhance PrismaWrapper with per-instance knobs (db_url_env_var, iam_endpoint, recreate_uses_datasource, log_prefix) so writer and reader wrappers are independent: the reader writes its fresh URL to DATABASE_URL_READ_REPLICA and passes datasource override to Prisma since Prisma only auto-reads DATABASE_URL
- Fix deadlock in PrismaWrapper.__getattr__: when called from inside a running event loop, schedule the token refresh as a background task instead of blocking with run_coroutine_threadsafe + future.result(), which would deadlock the loop thread waiting for a coroutine that needs the loop to run
- Fix botocore crash when DATABASE_PORT is unset by defaulting to "5432" in both proxy_cli.py and PrismaWrapper.get_rds_iam_token(); passing None caused botocore to embed the literal string "None" in the presigned URL
- Implement graceful reader degradation: reader connect/recreate failures are non-fatal; wrapper sets _reader_unavailable=True and silently routes reads to the writer to keep the proxy serving traffic during transient reader outages
- Add PrismaClient.writer_db property so the reconnect smoke-test always validates the writer engine specifically; query_raw on the routing wrapper would route to the reader and not verify the newly-recreated writer
- Expose DATABASE_URL_READ_REPLICA in Helm chart (values.yaml + deployment.yaml) via both plain value and secret key reference, and document the field in docker-compose.yml
- Add 887-line test suite covering routing logic, IAM token refresh paths, reader degradation scenarios, datasource override behavior, and the deadlock regression
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
Operators upgrading past 35bbca60b0 (which made /metrics auth
default-on) see "Malformed API Key passed in. Ensure Key has 'Bearer '
prefix." with no hint that
litellm_settings.require_auth_for_metrics_endpoint: false restores the
previous unauthenticated behavior. Append that discovery hint to the
existing 401 body so a Prometheus scraper that breaks after upgrade
has a clear migration path. No behavior change.
* feat(sso): show full IdP claims in /sso/debug/callback
The debug callback only displayed the proxy-parsed OpenID summary, so
customers couldn't verify what custom claims (team_id, team_alias, roles,
etc.) the IdP was actually returning. Render two new sections — Raw
Claims (userinfo) and Access Token Claims (decoded JWT) — alongside the
existing parsed view. Strip bearer tokens defense-in-depth in case a
non-conforming IdP places them in its userinfo response.
Resolves LIT-2838
* Update litellm/proxy/management_endpoints/ui_sso.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* fix(sso): hoist json.dumps out of f-string for py3.10 ruff
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Restore lineage between main and internal staging so the next
staging->main promotion (#27436) can merge without conflicts.
main was 2 commits ahead:
- 6ff668c7aa squash-merge of the previous staging->main promotion (#27245)
- 8c9830eef9 feat(xai): add grok-4.3 (#27396), already present on staging
The squash-merge has no shared lineage with the individual commits that
went into staging, which is why git surfaced 13 textual conflicts despite
both sides having the same logical content. Every conflicting file's
main-side change came from 6ff668c7aa only, and the matching staging-side
changes are the post-promotion evolution. Resolved all 13 with --ours
(staging's version is the latest evolution; main's snapshot is stale).
The grok-4.3 entries auto-merged in model_prices_and_context_window.json
and its backup, but were already on staging via an independent commit, so
the net diff vs HEAD is empty for those files.
Net new content from this merge: 12 lines added to
ui/litellm-dashboard/package-lock.json -- npm 11 libc array tags on four
existing entries, no functional impact.
Merge of cve-sweep-2026-05 into litellm_yj_may7 picked the older
npm@11.12.1 line, regressing the bump in f08b1b63fa that cleared
ip-address GHSA-v2v4-37r5-5v8g (npm@11.12.1 bundles ip-address@10.1.0;
11.14.0 bundles 10.1.1).
Google updated their Interactions OpenAPI spec
(https://ai.google.dev/static/api/interactions.openapi.json), removing
the readOnly 'outputs' property from CreateModelInteractionParams in
favor of 'steps' (a polymorphic transcript array). The compliance test
fetches the live spec, so it began failing on every PR once the spec
flipped over. Update the asserted output-field list to match.
Note: this only re-aligns the spec-shape assertion. Our SDK response
types (litellm/types/interactions/generated.py) still expose 'outputs'
and need to be regenerated separately to add 'steps'/Step variants and
decide on a back-compat path for callers reading .outputs.
The entry was added to cover the now-reverted black 24.10.0 -> 26.3.1
bump. With the bump dropped, upstream's existing liccheck setup is
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Revert disk_cache.py JSONDisk swap + remove test_disk_cache.py. The
JSONDisk migration is backwards-incompatible (existing pickle caches
become unreadable; non-JSON values raise unguarded TypeError on set)
and warrants its own focused PR with a feature flag rather than riding
along with the CVE/dep-bump sweep.
CVE-2025-69872 remains unmitigated at the diskcache layer; users
concerned about pickle-RCE on cache-dir writers can avoid Cache(type="disk")
or pin a fork until upstream ships a fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empirical grype scan of the built runtime image flagged
ip-address@10.1.0 (Medium) bundled inside /usr/local/lib/node_modules/npm.
npm@11.14.0 bundles ip-address@10.1.1 which carries the fix.
Verified by rebuilding the image and rescanning: ip-address finding gone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pin the new digest published 2026-05-05:
sha256:31da6565... (from sha256:3258be47...).
Delta in baseline packages: zlib 1.3.2-r2 -> 1.3.2-r3. glibc stays at
2.43-r7 (still the latest available; whatever further glibc fixes for
CVE-2026-5450 / CVE-2026-5928 land in -r8+ from Chainguard, this PR
doesn't touch those).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR was blocked by .github/workflows/guard-fork-dependencies.yml: fork PRs
cannot modify uv.lock. Reverting:
- uv.lock + pyproject.toml black bump (24.10.0 -> 26.3.1) and the 295
files of mechanical Black 26 reformat coupled to it
- pyproject.toml diskcache extra change (kept the runtime mitigation in
litellm/caching/disk_cache.py via JSONDisk)
Kept:
- Dockerfile cache narrowing (drops ~660 MB of uv build cache that
surfaced cached setuptools as CVE findings)
- litellm/caching/disk_cache.py: dc.JSONDisk to neutralize CVE-2025-69872
- ui/litellm-dashboard/package-lock.json + litellm-js/spend-logs/package-lock.json:
next/postcss/hono/uuid CVE bumps (these are not blocked by the fork guard)
- tests/test_litellm/caching/test_disk_cache.py
- tests/code_coverage_tests/liccheck.ini: harmless black authorization
Black + gitpython + langchain dep upgrades will need a follow-up from a
maintainer pushing a branch in the canonical BerriAI/litellm repo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strip VCR wiring from the batches test conftest. Drops:
- import of `_vcr_conftest_common` helpers
- the `vcr_config` fixture, `pytest_recording_configure`,
`_vcr_outcome_gate`, `pytest_runtest_makereport`
- the `apply_vcr_auto_marker_to_items` call in
`pytest_collection_modifyitems`
- `VerboseReporterState` / its `pytest_configure` /
`pytest_runtest_logreport` hooks (purely VCR-verdict plumbing)
Why: every test in this directory creates ephemeral OpenAI / Bedrock /
vLLM resources whose IDs change per run (file-XXX, batch-XXX,
ft-XXX, ...). VCR's path/query/body matchers don't match across runs,
so `record_mode="new_episodes"` was silently passing through to the
live API and recording many new cassette entries every run. Cassette
bloat without replay benefit.
Behaviour after this change is identical to running the directory
without `CASSETTE_REDIS_URL` set: tests that have keys hit live APIs,
tests that don't continue to skip via their existing skipif markers.
Conftest now keeps only path setup and the session-scoped `event_loop`
fixture.
OpenAI announced gpt-3.5-turbo-0125 (and fine-tuning of gpt-3.5-turbo
in general) for shutdown on 2026-10-23, with the announcement landing
2026-04-22. The hard-fail date is ~5 months out, but timing fits the
recent uptick in this test flaking and OpenAI may already be running
the deprecated model's pipeline with deprioritized infra.
Bump to gpt-4o-mini-2024-07-18 — currently supported for fine-tuning,
no announced shutdown. Updates the live test plus the mocked test for
consistency. Belt-and-suspenders with the existing propagation-retry
helper.
Previous fix polled `litellm.afile_retrieve` for `status == "processed"`
before calling the fine-tuning endpoint. That doesn't actually solve
the race:
- OpenAI's `FileObject.status` field is deprecated per the SDK type and
not authoritative — it can read "processed" before the file is usable.
- The retrieve and fine-tuning endpoints don't share a consistency
model, so retrieve succeeding tells you nothing about FT visibility.
Replace with a retry around the actual `acreate_fine_tuning_job` call
that catches the OpenAI 400 `'file-... does not exist'` and backs off
exponentially (1s → cap 8s, 12 attempts, ~70s total budget). The
operation succeeding is the only reliable signal that propagation
finished.
OpenAI file uploads are eventually consistent — a freshly uploaded file
may briefly 404 from `retrieve` and is rejected by the fine-tuning
endpoint with `'file-... does not exist'` until processing finishes.
The async fine-tuning test called `acreate_fine_tuning_job` immediately
after `acreate_file` and flaked on this race.
Add a polling helper that waits up to ~30s for `status=processed` (and
short-circuits on `error`), called between upload and FT job creation.
Mirrors the same propagation lag covered by the `await asyncio.sleep(1)`
in the sister batches test, but more robust against longer delays.
OpenAI deprecated the gpt-4o-realtime-preview-2024-10-01 snapshot,
which caused these E2E tests to fail consistently in CI. Bump to the
unversioned gpt-4o-realtime-preview alias to match the sibling
test_openai_realtime_simple.py and stay current as OpenAI rolls the
alias forward.
- Add `_strip_image_b64_payloads` filter: rewrites `data[*].b64_json` in
image-gen responses to a 4-byte placeholder before the cassette is saved.
Image-edit and image-gen cassettes (193 MB / 184 MB / 104 MB / ...) will
shrink to <100 KB on next record. Tests assert response shape only, so
coverage is preserved.
- Add `_normalize_multipart_boundary` filter: replaces httpx's per-request
random multipart boundary with a fixed string in both Content-Type header
and body bytes. Audio-transcription / Whisper tests have been effectively
unmocked — every CI run hit live providers and was silently capped at
MAX_EPISODES_PER_CASSETTE=50. Both record and replay now see identical
bytes; the safe_body matcher works.
- Fix test_evals_api.py body poisoning: replace `int(time.time())` in eval
names with `hashlib.sha1(test_node_name)[:12]`, add a function-scoped
`managed_eval` fixture that creates and deletes the eval, and switch
`get_eval` / `update_eval` from `list_evals().data[0].id` (which made
the URL vary by run) to `managed_eval.id`. Net coverage gain: delete is
now actually exercised.
- Swap arxiv PDF URL in BaseOCRTest for the in-repo `dummy.pdf` (589 B)
served via sha-pinned jsdelivr.
- Swap etsystatic image URL in BaseLLMChatTest.test_image_url for the
in-repo LiteLLM logo (9.2 KB) served via the same jsdelivr pin.
- Add `tests/llm_translation/test_vcr_filters.py` with 14 unit tests
covering both new filters: replacement, idempotency, nesting, content-
length update, two-distinct-boundaries-converge-after-normalize, etc.
Cassettes recorded with the prior patterns will mismatch on the first CI
run after merge; recommend flushing the cassette Redis once (post-merge)
so re-records save under the new format from the start.
* feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_context_window.json
xAI's docs page now lists grok-4.3 as the recommended chat / coding model:
"We strongly recommend all API callers use grok-4.3. It is the most
intelligent and fastest model we've built." (https://docs.x.ai/docs/models)
Pricing/specs sourced from xAI's published model metadata:
- input: $1.25 / 1M tokens (<=200k), $2.50 / 1M tokens (>200k)
- output: $2.50 / 1M tokens (<=200k), $5.00 / 1M tokens (>200k)
- cached: $0.20 / 1M tokens (<=200k), $0.40 / 1M tokens (>200k)
- context: 1,000,000 tokens
- capabilities: vision, reasoning, function calling, structured outputs,
prompt caching, web search
Adds two entries: `xai/grok-4.3` (canonical) and `xai/grok-4.3-latest` (alias),
mirroring the pattern used for the rest of the xAI/Grok-4 family.
* test(xai): add model_info test for grok-4.3 + sync backup cost map
- Mirror xai/grok-4.3 and xai/grok-4.3-latest entries into
litellm/model_prices_and_context_window_backup.json so the bundled
model cost map matches the canonical model_prices_and_context_window.json.
- Add tests/test_litellm/test_xai_grok_4_3_model_metadata.py covering
pricing tiers, capability flags, context window, provider routing,
and parity between the main and backup cost maps.
- Point 'source' at the live xAI models page (the per-model URL
https://docs.x.ai/docs/models/grok-4.3 currently 404s).
---------
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
Co-authored-by: shin-watcher <shin-watcher@berri.ai>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_context_window.json
xAI's docs page now lists grok-4.3 as the recommended chat / coding model:
"We strongly recommend all API callers use grok-4.3. It is the most
intelligent and fastest model we've built." (https://docs.x.ai/docs/models)
Pricing/specs sourced from xAI's published model metadata:
- input: $1.25 / 1M tokens (<=200k), $2.50 / 1M tokens (>200k)
- output: $2.50 / 1M tokens (<=200k), $5.00 / 1M tokens (>200k)
- cached: $0.20 / 1M tokens (<=200k), $0.40 / 1M tokens (>200k)
- context: 1,000,000 tokens
- capabilities: vision, reasoning, function calling, structured outputs,
prompt caching, web search
Adds two entries: `xai/grok-4.3` (canonical) and `xai/grok-4.3-latest` (alias),
mirroring the pattern used for the rest of the xAI/Grok-4 family.
* test(xai): add model_info test for grok-4.3 + sync backup cost map
- Mirror xai/grok-4.3 and xai/grok-4.3-latest entries into
litellm/model_prices_and_context_window_backup.json so the bundled
model cost map matches the canonical model_prices_and_context_window.json.
- Add tests/test_litellm/test_xai_grok_4_3_model_metadata.py covering
pricing tiers, capability flags, context window, provider routing,
and parity between the main and backup cost maps.
- Point 'source' at the live xAI models page (the per-model URL
https://docs.x.ai/docs/models/grok-4.3 currently 404s).
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
---------
Co-authored-by: shin-watcher <shin-watcher@berri.ai>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* Refactor Bedrock response stream shape handling
- Introduced a module-level constant `BEDROCK_RESPONSE_STREAM_SHAPE` to cache the response stream shape, eliminating the need for per-instance caching in `BedrockEventStreamDecoderBase`.
- Updated relevant methods to utilize the new constant, improving performance by avoiding redundant loading of the shape.
- Added tests to ensure the shape is loaded correctly at import time and is consistent across different modules.
- Added a new mock server script for testing Bedrock pass-through functionality.
* Refactor response parsing for Bedrock and SageMaker
- Improved code readability by formatting the parsing method calls in `AWSEventStreamDecoder` for both Bedrock and SageMaker response stream shapes.
- Added blank lines for better separation of code blocks in `invoke_handler.py` and `common_utils.py` to enhance maintainability.
* Enhance error handling for Bedrock and SageMaker response stream shape loading
- Wrapped the loading logic in `_load_bedrock_response_stream_shape` and `_load_sagemaker_response_stream_shape` with try-except blocks to gracefully handle exceptions.
- Added logging to warn when the response stream shape cannot be pre-loaded, ensuring the module imports cleanly.
- Updated tests to verify that loading failures return `None` instead of propagating exceptions.
* Implement error handling for missing response stream shapes in Bedrock and SageMaker
- Added checks in `_parse_message_from_event` methods to raise appropriate errors when `BEDROCK_RESPONSE_STREAM_SHAPE` or `SAGEMAKER_RESPONSE_STREAM_SHAPE` is None, ensuring clearer error reporting.
- Updated logging messages to reflect the unavailability of event-stream decoding for both Bedrock and SageMaker.
- Enhanced unit tests to verify that the correct exceptions are raised when the response stream shapes are not loaded.