Commit Graph

39037 Commits

Author SHA1 Message Date
Yassin Kortam
b5d3a5fc85
feat: add read-replica routing for Prisma DB via DATABASE_URL_READ_REPLICA (#27493)
- Introduce RoutingPrismaWrapper that transparently routes read operations (find_*, count, group_by, query_raw, query_first) to a reader endpoint while writes remain on the writer, enabling Aurora-style reader/writer endpoint splits
- Add IAMEndpoint dataclass and parse_iam_endpoint_from_url() to capture static connection fields from a reader URL so only the IAM token needs to rotate, avoiding the need for separate DATABASE_HOST_READ_REPLICA/etc. env vars
- Enhance PrismaWrapper with per-instance knobs (db_url_env_var, iam_endpoint, recreate_uses_datasource, log_prefix) so writer and reader wrappers are independent: the reader writes its fresh URL to DATABASE_URL_READ_REPLICA and passes datasource override to Prisma since Prisma only auto-reads DATABASE_URL
- Fix deadlock in PrismaWrapper.__getattr__: when called from inside a running event loop, schedule the token refresh as a background task instead of blocking with run_coroutine_threadsafe + future.result(), which would deadlock the loop thread waiting for a coroutine that needs the loop to run
- Fix botocore crash when DATABASE_PORT is unset by defaulting to "5432" in both proxy_cli.py and PrismaWrapper.get_rds_iam_token(); passing None caused botocore to embed the literal string "None" in the presigned URL
- Implement graceful reader degradation: reader connect/recreate failures are non-fatal; wrapper sets _reader_unavailable=True and silently routes reads to the writer to keep the proxy serving traffic during transient reader outages
- Add PrismaClient.writer_db property so the reconnect smoke-test always validates the writer engine specifically; query_raw on the routing wrapper would route to the reader and not verify the newly-recreated writer
- Expose DATABASE_URL_READ_REPLICA in Helm chart (values.yaml + deployment.yaml) via both plain value and secret key reference, and document the field in docker-compose.yml
- Add 887-line test suite covering routing logic, IAM token refresh paths, reader degradation scenarios, datasource override behavior, and the deadlock regression

Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-08 21:05:50 -07:00
yuneng-jiang
0bcff0214a
Merge pull request #27502 from BerriAI/litellm_/trusting-hoover-2bbbc8
fix(proxy): point /metrics 401 at the opt-out flag
2026-05-08 18:31:04 -07:00
yuneng-jiang
0824c4c77e
Merge pull request #27403 from BerriAI/litellm_otelGenaiCaptureMessageContent
[Feat] Honor OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
2026-05-08 18:17:00 -07:00
Yuneng Jiang
4f3608b15a
fix(proxy): point /metrics 401 at the opt-out flag
Operators upgrading past 35bbca60b0 (which made /metrics auth
default-on) see "Malformed API Key passed in. Ensure Key has 'Bearer '
prefix." with no hint that
litellm_settings.require_auth_for_metrics_endpoint: false restores the
previous unauthenticated behavior. Append that discovery hint to the
existing 401 body so a Prometheus scraper that breaks after upgrade
has a clear migration path. No behavior change.
2026-05-08 18:09:14 -07:00
ryan-crabbe-berri
13a193367f
feat(sso): show full IdP claims in /sso/debug/callback (#27498)
* feat(sso): show full IdP claims in /sso/debug/callback

The debug callback only displayed the proxy-parsed OpenID summary, so
customers couldn't verify what custom claims (team_id, team_alias, roles,
etc.) the IdP was actually returning. Render two new sections — Raw
Claims (userinfo) and Access Token Claims (decoded JWT) — alongside the
existing parsed view. Strip bearer tokens defense-in-depth in case a
non-conforming IdP places them in its userinfo response.

Resolves LIT-2838

* Update litellm/proxy/management_endpoints/ui_sso.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(sso): hoist json.dumps out of f-string for py3.10 ruff

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-05-08 17:39:39 -07:00
oss-agent-shin
f2e97380d2
Add OpenRouter Qwen 3.6 Plus metadata (#27486)
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-08 16:25:45 -07:00
oss-agent-shin
ae67cecc22
Allow team admins to test model connections (#27487)
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-08 15:30:41 -07:00
Parijat Sharma
144279eb57
fix(ui): URL-encode team_id in teamInfoCall to handle special characters (#27466) 2026-05-08 10:46:46 -07:00
yuneng-jiang
98cd057f38
Merge pull request #27437 from BerriAI/litellm_merge_main_staging
chore: merge main into internal_staging to restore lineage
2026-05-07 17:57:05 -07:00
Yuneng Jiang
faba1fda8a
Merge remote-tracking branch 'origin/main' into litellm_merge_main_staging
Restore lineage between main and internal staging so the next
staging->main promotion (#27436) can merge without conflicts.

main was 2 commits ahead:
  - 6ff668c7aa  squash-merge of the previous staging->main promotion (#27245)
  - 8c9830eef9  feat(xai): add grok-4.3 (#27396), already present on staging

The squash-merge has no shared lineage with the individual commits that
went into staging, which is why git surfaced 13 textual conflicts despite
both sides having the same logical content. Every conflicting file's
main-side change came from 6ff668c7aa only, and the matching staging-side
changes are the post-promotion evolution. Resolved all 13 with --ours
(staging's version is the latest evolution; main's snapshot is stale).

The grok-4.3 entries auto-merged in model_prices_and_context_window.json
and its backup, but were already on staging via an independent commit, so
the net diff vs HEAD is empty for those files.

Net new content from this merge: 12 lines added to
ui/litellm-dashboard/package-lock.json -- npm 11 libc array tags on four
existing entries, no functional impact.
2026-05-07 17:49:48 -07:00
yuneng-jiang
ee8d8c4137
Merge pull request #27431 from BerriAI/yj_bump_may7
[Infra] Bump versions
2026-05-07 17:40:22 -07:00
Yuneng Jiang
1f1963f1d0
Merge remote-tracking branch 'origin' into yj_bump_may7 2026-05-07 17:34:13 -07:00
yuneng-jiang
a6195cc7d7
Merge pull request #27433 from BerriAI/litellm_yj_may7
[Infra] Merge dev branch
2026-05-07 17:33:42 -07:00
Yuneng Jiang
5e2c283604
fix(docker): restore npm@11.14.0 lost in merge resolution
Merge of cve-sweep-2026-05 into litellm_yj_may7 picked the older
npm@11.12.1 line, regressing the bump in f08b1b63fa that cleared
ip-address GHSA-v2v4-37r5-5v8g (npm@11.12.1 bundles ip-address@10.1.0;
11.14.0 bundles 10.1.1).
2026-05-07 17:25:10 -07:00
yuneng-jiang
5082f9bc71
Merge pull request #27225 from stuxf/cve-sweep-2026-05
[Security] Clear AWS Inspector CVE findings on Docker image
2026-05-07 17:20:11 -07:00
yuneng-jiang
309cc36f9d
Merge branch 'litellm_yj_may7' into cve-sweep-2026-05 2026-05-07 17:19:37 -07:00
yuneng-jiang
a20c020101
Merge pull request #27432 from BerriAI/litellm_/elegant-mestorf-549737
test(interactions): align openapi compliance with upstream rename outputs->steps
2026-05-07 17:18:27 -07:00
yuneng-jiang
d351abd76b
Merge pull request #27430 from BerriAI/litellm_fix/remove-separate-health-app
fix: remove separate health app
2026-05-07 17:13:52 -07:00
Yuneng Jiang
3d67f00ede
test(interactions): align openapi compliance with upstream rename outputs->steps
Google updated their Interactions OpenAPI spec
(https://ai.google.dev/static/api/interactions.openapi.json), removing
the readOnly 'outputs' property from CreateModelInteractionParams in
favor of 'steps' (a polymorphic transcript array). The compliance test
fetches the live spec, so it began failing on every PR once the spec
flipped over. Update the asserted output-field list to match.

Note: this only re-aligns the spec-shape assertion. Our SDK response
types (litellm/types/interactions/generated.py) still expose 'outputs'
and need to be regenerated separately to add 'steps'/Step variants and
decide on a back-compat path for callers reading .outputs.
2026-05-07 17:09:25 -07:00
user
f74fd4382c
Drop liccheck black allowlist entry
The entry was added to cover the now-reverted black 24.10.0 -> 26.3.1
bump. With the bump dropped, upstream's existing liccheck setup is
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 00:04:11 +00:00
user
9a2aeb0a41
Drop diskcache changes from PR
Revert disk_cache.py JSONDisk swap + remove test_disk_cache.py. The
JSONDisk migration is backwards-incompatible (existing pickle caches
become unreadable; non-JSON values raise unguarded TypeError on set)
and warrants its own focused PR with a feature flag rather than riding
along with the CVE/dep-bump sweep.

CVE-2025-69872 remains unmitigated at the diskcache layer; users
concerned about pickle-RCE on cache-dir writers can avoid Cache(type="disk")
or pin a fork until upstream ships a fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 00:03:05 +00:00
user
f08b1b63fa
Bump npm 11.12.1 -> 11.14.0 to clear ip-address GHSA-v2v4-37r5-5v8g
Empirical grype scan of the built runtime image flagged
ip-address@10.1.0 (Medium) bundled inside /usr/local/lib/node_modules/npm.
npm@11.14.0 bundles ip-address@10.1.1 which carries the fix.

Verified by rebuilding the image and rescanning: ip-address finding gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 23:56:43 +00:00
user
4d6e2bc3da
Bump wolfi-base to latest (2026-05-05)
Pin the new digest published 2026-05-05:
sha256:31da6565... (from sha256:3258be47...).

Delta in baseline packages: zlib 1.3.2-r2 -> 1.3.2-r3. glibc stays at
2.43-r7 (still the latest available; whatever further glibc fixes for
CVE-2026-5450 / CVE-2026-5928 land in -r8+ from Chainguard, this PR
doesn't touch those).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 23:36:04 +00:00
Yuneng Jiang
086a23753e
uv lock 2026-05-07 16:30:15 -07:00
Yuneng Jiang
44aecb6f66
bump: version 1.84.0 → 1.85.0 2026-05-07 16:28:33 -07:00
yuneng-jiang
40a490aed7
Merge pull request #27241 from BerriAI/litellm_/nifty-kilby-82870d
[Infra] Packaging: Relax Core Runtime Pins To Ranges
2026-05-07 16:15:57 -07:00
Yuneng Jiang
9ae9b81c1b
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/nifty-kilby-82870d
# Conflicts:
#	uv.lock
2026-05-07 16:10:22 -07:00
Yassin Kortam
451ce161fc fix: remove separate health app 2026-05-07 16:04:56 -07:00
user
5bafa8b3a2
Drop dep bumps + black-26 reformat to clear fork CI policy
PR was blocked by .github/workflows/guard-fork-dependencies.yml: fork PRs
cannot modify uv.lock. Reverting:

- uv.lock + pyproject.toml black bump (24.10.0 -> 26.3.1) and the 295
  files of mechanical Black 26 reformat coupled to it
- pyproject.toml diskcache extra change (kept the runtime mitigation in
  litellm/caching/disk_cache.py via JSONDisk)

Kept:
- Dockerfile cache narrowing (drops ~660 MB of uv build cache that
  surfaced cached setuptools as CVE findings)
- litellm/caching/disk_cache.py: dc.JSONDisk to neutralize CVE-2025-69872
- ui/litellm-dashboard/package-lock.json + litellm-js/spend-logs/package-lock.json:
  next/postcss/hono/uuid CVE bumps (these are not blocked by the fork guard)
- tests/test_litellm/caching/test_disk_cache.py
- tests/code_coverage_tests/liccheck.ini: harmless black authorization

Black + gitpython + langchain dep upgrades will need a follow-up from a
maintainer pushing a branch in the canonical BerriAI/litellm repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 23:04:52 +00:00
user
63bda3f001
Merge remote-tracking branch 'upstream/litellm_internal_staging' into cve-sweep-2026-05
# Conflicts:
#	uv.lock
2026-05-07 23:03:28 +00:00
yuneng-jiang
0fb88d50dd
Merge pull request #27415 from BerriAI/litellm_/sweet-mcclintock-2b3656
[Fix] Realtime Tests: Update Deprecated OpenAI Model Pin
2026-05-07 15:46:51 -07:00
Yuneng Jiang
a43dc9f0b1
[Fix] Batches Tests: Remove VCR Auto-Marker
Strip VCR wiring from the batches test conftest. Drops:

- import of `_vcr_conftest_common` helpers
- the `vcr_config` fixture, `pytest_recording_configure`,
  `_vcr_outcome_gate`, `pytest_runtest_makereport`
- the `apply_vcr_auto_marker_to_items` call in
  `pytest_collection_modifyitems`
- `VerboseReporterState` / its `pytest_configure` /
  `pytest_runtest_logreport` hooks (purely VCR-verdict plumbing)

Why: every test in this directory creates ephemeral OpenAI / Bedrock /
vLLM resources whose IDs change per run (file-XXX, batch-XXX,
ft-XXX, ...). VCR's path/query/body matchers don't match across runs,
so `record_mode="new_episodes"` was silently passing through to the
live API and recording many new cassette entries every run. Cassette
bloat without replay benefit.

Behaviour after this change is identical to running the directory
without `CASSETTE_REDIS_URL` set: tests that have keys hit live APIs,
tests that don't continue to skip via their existing skipif markers.

Conftest now keeps only path setup and the session-scoped `event_loop`
fixture.
2026-05-07 15:39:46 -07:00
ishaan-berri
e4c14862fc
feat(mcp): add OBO MCP Auth (#27421)
* feat(mcp): add oauth2 token exchange auth

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

* fix(mcp): cache token exchange fallback

Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>

---------

Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-07 15:35:21 -07:00
Yuneng Jiang
5256a1fdb6
[Fix] Fine-Tuning Test: Bump Off Deprecated gpt-3.5-turbo-0125
OpenAI announced gpt-3.5-turbo-0125 (and fine-tuning of gpt-3.5-turbo
in general) for shutdown on 2026-10-23, with the announcement landing
2026-04-22. The hard-fail date is ~5 months out, but timing fits the
recent uptick in this test flaking and OpenAI may already be running
the deprecated model's pipeline with deprioritized infra.

Bump to gpt-4o-mini-2024-07-18 — currently supported for fine-tuning,
no announced shutdown. Updates the live test plus the mocked test for
consistency. Belt-and-suspenders with the existing propagation-retry
helper.
2026-05-07 14:41:58 -07:00
Yuneng Jiang
a8cad84dc7
[Fix] Fine-Tuning Test: Retry on File Propagation 400
Previous fix polled `litellm.afile_retrieve` for `status == "processed"`
before calling the fine-tuning endpoint. That doesn't actually solve
the race:

- OpenAI's `FileObject.status` field is deprecated per the SDK type and
  not authoritative — it can read "processed" before the file is usable.
- The retrieve and fine-tuning endpoints don't share a consistency
  model, so retrieve succeeding tells you nothing about FT visibility.

Replace with a retry around the actual `acreate_fine_tuning_job` call
that catches the OpenAI 400 `'file-... does not exist'` and backs off
exponentially (1s → cap 8s, 12 attempts, ~70s total budget). The
operation succeeding is the only reliable signal that propagation
finished.
2026-05-07 14:17:45 -07:00
Yuneng Jiang
a64716ed5b
[Fix] Fine-Tuning Test: Wait for OpenAI File Propagation
OpenAI file uploads are eventually consistent — a freshly uploaded file
may briefly 404 from `retrieve` and is rejected by the fine-tuning
endpoint with `'file-... does not exist'` until processing finishes.
The async fine-tuning test called `acreate_fine_tuning_job` immediately
after `acreate_file` and flaked on this race.

Add a polling helper that waits up to ~30s for `status=processed` (and
short-circuits on `error`), called between upload and FT job creation.
Mirrors the same propagation lag covered by the `await asyncio.sleep(1)`
in the sister batches test, but more robust against longer delays.
2026-05-07 14:06:04 -07:00
Yuneng Jiang
4189d78a64
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/sweet-mcclintock-2b3656 2026-05-07 13:46:53 -07:00
Yuneng Jiang
9e7dc5ef68
[Fix] Realtime Tests: Update Deprecated OpenAI Model Pin
OpenAI deprecated the gpt-4o-realtime-preview-2024-10-01 snapshot,
which caused these E2E tests to fail consistently in CI. Bump to the
unversioned gpt-4o-realtime-preview alias to match the sibling
test_openai_realtime_simple.py and stay current as OpenAI rolls the
alias forward.
2026-05-07 13:46:45 -07:00
michelligabriele
3b78a3a545
fix(chat-completions): decode unified file_id when model_file_id_mapping is unavailable (#27406)
* fix(chat-completions): decode unified file_id when model_file_id_mapping is unavailable

* fix(chat-completions): tolerate non-dict content items (e.g. token-ids from text_completion)
2026-05-07 13:17:04 -07:00
ishaan-berri
b891a201f8
Preserve LiteLLM headers for passthrough responses (#27412)
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com>
Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
2026-05-07 12:59:36 -07:00
Shivam Rawat
29e4eb16da
Merge pull request #27222 from BerriAI/litellm_s3AuditParams
[Feat] Decouple S3 audit-log config via s3_audit_callback_params
2026-05-07 12:49:03 -07:00
yuneng-jiang
b9b315157b
Merge pull request #27409 from BerriAI/litellm_/inspiring-allen-ec64a4
[Fix] Tests: Reduce VCR cassette bloat and fix multipart caching
2026-05-07 12:39:58 -07:00
Michael-RZ-Berri
db8198faba
[Fix] Allow non-admin compliance path reads (#27234)
* allow non-admin roles on /compliance/* read routes

* Restrict compliance routes to internal users

---------

Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-05-07 14:07:23 -05:00
Yuneng Jiang
2f9519d286
[Fix] Tests: Reduce VCR cassette bloat and fix multipart caching
- Add `_strip_image_b64_payloads` filter: rewrites `data[*].b64_json` in
  image-gen responses to a 4-byte placeholder before the cassette is saved.
  Image-edit and image-gen cassettes (193 MB / 184 MB / 104 MB / ...) will
  shrink to <100 KB on next record. Tests assert response shape only, so
  coverage is preserved.
- Add `_normalize_multipart_boundary` filter: replaces httpx's per-request
  random multipart boundary with a fixed string in both Content-Type header
  and body bytes. Audio-transcription / Whisper tests have been effectively
  unmocked — every CI run hit live providers and was silently capped at
  MAX_EPISODES_PER_CASSETTE=50. Both record and replay now see identical
  bytes; the safe_body matcher works.
- Fix test_evals_api.py body poisoning: replace `int(time.time())` in eval
  names with `hashlib.sha1(test_node_name)[:12]`, add a function-scoped
  `managed_eval` fixture that creates and deletes the eval, and switch
  `get_eval` / `update_eval` from `list_evals().data[0].id` (which made
  the URL vary by run) to `managed_eval.id`. Net coverage gain: delete is
  now actually exercised.
- Swap arxiv PDF URL in BaseOCRTest for the in-repo `dummy.pdf` (589 B)
  served via sha-pinned jsdelivr.
- Swap etsystatic image URL in BaseLLMChatTest.test_image_url for the
  in-repo LiteLLM logo (9.2 KB) served via the same jsdelivr pin.
- Add `tests/llm_translation/test_vcr_filters.py` with 14 unit tests
  covering both new filters: replacement, idempotency, nesting, content-
  length update, two-distinct-boundaries-converge-after-normalize, etc.

Cassettes recorded with the prior patterns will mismatch on the first CI
run after merge; recommend flushing the cassette Redis once (post-merge)
so re-records save under the new format from the start.
2026-05-07 11:54:19 -07:00
michelligabriele
9f1b41d206
fix(proxy): run model-level post_call guardrails on streaming requests (#26922) 2026-05-07 11:53:03 -07:00
Cursor Agent
492118de25
Fix OTEL false content capture mode 2026-05-07 18:12:23 +00:00
Michael Riad Zaky
581ae1443d honor OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT 2026-05-07 09:44:39 -07:00
Mateo Wang
8c9830eef9
feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_conte… (#27154) (#27396)
* feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_context_window.json

xAI's docs page now lists grok-4.3 as the recommended chat / coding model:
"We strongly recommend all API callers use grok-4.3. It is the most
intelligent and fastest model we've built." (https://docs.x.ai/docs/models)

Pricing/specs sourced from xAI's published model metadata:
  - input:  $1.25 / 1M tokens (<=200k),  $2.50 / 1M tokens (>200k)
  - output: $2.50 / 1M tokens (<=200k),  $5.00 / 1M tokens (>200k)
  - cached: $0.20 / 1M tokens (<=200k),  $0.40 / 1M tokens (>200k)
  - context: 1,000,000 tokens
  - capabilities: vision, reasoning, function calling, structured outputs,
    prompt caching, web search

Adds two entries: `xai/grok-4.3` (canonical) and `xai/grok-4.3-latest` (alias),
mirroring the pattern used for the rest of the xAI/Grok-4 family.

* test(xai): add model_info test for grok-4.3 + sync backup cost map

- Mirror xai/grok-4.3 and xai/grok-4.3-latest entries into
  litellm/model_prices_and_context_window_backup.json so the bundled
  model cost map matches the canonical model_prices_and_context_window.json.
- Add tests/test_litellm/test_xai_grok_4_3_model_metadata.py covering
  pricing tiers, capability flags, context window, provider routing,
  and parity between the main and backup cost maps.
- Point 'source' at the live xAI models page (the per-model URL
  https://docs.x.ai/docs/models/grok-4.3 currently 404s).



---------

Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
Co-authored-by: shin-watcher <shin-watcher@berri.ai>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-07 09:39:26 -07:00
ishaan-berri
fee5900acc
feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_conte… (#27154)
* feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_context_window.json

xAI's docs page now lists grok-4.3 as the recommended chat / coding model:
"We strongly recommend all API callers use grok-4.3. It is the most
intelligent and fastest model we've built." (https://docs.x.ai/docs/models)

Pricing/specs sourced from xAI's published model metadata:
  - input:  $1.25 / 1M tokens (<=200k),  $2.50 / 1M tokens (>200k)
  - output: $2.50 / 1M tokens (<=200k),  $5.00 / 1M tokens (>200k)
  - cached: $0.20 / 1M tokens (<=200k),  $0.40 / 1M tokens (>200k)
  - context: 1,000,000 tokens
  - capabilities: vision, reasoning, function calling, structured outputs,
    prompt caching, web search

Adds two entries: `xai/grok-4.3` (canonical) and `xai/grok-4.3-latest` (alias),
mirroring the pattern used for the rest of the xAI/Grok-4 family.

* test(xai): add model_info test for grok-4.3 + sync backup cost map

- Mirror xai/grok-4.3 and xai/grok-4.3-latest entries into
  litellm/model_prices_and_context_window_backup.json so the bundled
  model cost map matches the canonical model_prices_and_context_window.json.
- Add tests/test_litellm/test_xai_grok_4_3_model_metadata.py covering
  pricing tiers, capability flags, context window, provider routing,
  and parity between the main and backup cost maps.
- Point 'source' at the live xAI models page (the per-model URL
  https://docs.x.ai/docs/models/grok-4.3 currently 404s).

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

---------

Co-authored-by: shin-watcher <shin-watcher@berri.ai>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-05-07 09:06:56 -07:00
harish-berri
a67b7a7e87
Refactor Bedrock response stream shape handling (#27257)
* Refactor Bedrock response stream shape handling

- Introduced a module-level constant `BEDROCK_RESPONSE_STREAM_SHAPE` to cache the response stream shape, eliminating the need for per-instance caching in `BedrockEventStreamDecoderBase`.
- Updated relevant methods to utilize the new constant, improving performance by avoiding redundant loading of the shape.
- Added tests to ensure the shape is loaded correctly at import time and is consistent across different modules.
- Added a new mock server script for testing Bedrock pass-through functionality.

* Refactor response parsing for Bedrock and SageMaker

- Improved code readability by formatting the parsing method calls in `AWSEventStreamDecoder` for both Bedrock and SageMaker response stream shapes.
- Added blank lines for better separation of code blocks in `invoke_handler.py` and `common_utils.py` to enhance maintainability.

* Enhance error handling for Bedrock and SageMaker response stream shape loading

- Wrapped the loading logic in `_load_bedrock_response_stream_shape` and `_load_sagemaker_response_stream_shape` with try-except blocks to gracefully handle exceptions.
- Added logging to warn when the response stream shape cannot be pre-loaded, ensuring the module imports cleanly.
- Updated tests to verify that loading failures return `None` instead of propagating exceptions.

* Implement error handling for missing response stream shapes in Bedrock and SageMaker

- Added checks in `_parse_message_from_event` methods to raise appropriate errors when `BEDROCK_RESPONSE_STREAM_SHAPE` or `SAGEMAKER_RESPONSE_STREAM_SHAPE` is None, ensuring clearer error reporting.
- Updated logging messages to reflect the unavailability of event-stream decoding for both Bedrock and SageMaker.
- Enhanced unit tests to verify that the correct exceptions are raised when the response stream shapes are not loaded.
2026-05-06 17:39:38 -07:00