litellm

Author	SHA1	Message	Date
Yassin Kortam	b5d3a5fc85	feat: add read-replica routing for Prisma DB via DATABASE_URL_READ_REPLICA (#27493 ) - Introduce RoutingPrismaWrapper that transparently routes read operations (find_*, count, group_by, query_raw, query_first) to a reader endpoint while writes remain on the writer, enabling Aurora-style reader/writer endpoint splits - Add IAMEndpoint dataclass and parse_iam_endpoint_from_url() to capture static connection fields from a reader URL so only the IAM token needs to rotate, avoiding the need for separate DATABASE_HOST_READ_REPLICA/etc. env vars - Enhance PrismaWrapper with per-instance knobs (db_url_env_var, iam_endpoint, recreate_uses_datasource, log_prefix) so writer and reader wrappers are independent: the reader writes its fresh URL to DATABASE_URL_READ_REPLICA and passes datasource override to Prisma since Prisma only auto-reads DATABASE_URL - Fix deadlock in PrismaWrapper.__getattr__: when called from inside a running event loop, schedule the token refresh as a background task instead of blocking with run_coroutine_threadsafe + future.result(), which would deadlock the loop thread waiting for a coroutine that needs the loop to run - Fix botocore crash when DATABASE_PORT is unset by defaulting to "5432" in both proxy_cli.py and PrismaWrapper.get_rds_iam_token(); passing None caused botocore to embed the literal string "None" in the presigned URL - Implement graceful reader degradation: reader connect/recreate failures are non-fatal; wrapper sets _reader_unavailable=True and silently routes reads to the writer to keep the proxy serving traffic during transient reader outages - Add PrismaClient.writer_db property so the reconnect smoke-test always validates the writer engine specifically; query_raw on the routing wrapper would route to the reader and not verify the newly-recreated writer - Expose DATABASE_URL_READ_REPLICA in Helm chart (values.yaml + deployment.yaml) via both plain value and secret key reference, and document the field in docker-compose.yml - Add 887-line test suite covering routing logic, IAM token refresh paths, reader degradation scenarios, datasource override behavior, and the deadlock regression Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-08 21:05:50 -07:00
yuneng-jiang	0bcff0214a	Merge pull request #27502 from BerriAI/litellm_/trusting-hoover-2bbbc8 fix(proxy): point /metrics 401 at the opt-out flag	2026-05-08 18:31:04 -07:00
yuneng-jiang	0824c4c77e	Merge pull request #27403 from BerriAI/litellm_otelGenaiCaptureMessageContent [Feat] Honor OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT	2026-05-08 18:17:00 -07:00
Yuneng Jiang	4f3608b15a	fix(proxy): point /metrics 401 at the opt-out flag Operators upgrading past `35bbca60b0` (which made /metrics auth default-on) see "Malformed API Key passed in. Ensure Key has 'Bearer ' prefix." with no hint that litellm_settings.require_auth_for_metrics_endpoint: false restores the previous unauthenticated behavior. Append that discovery hint to the existing 401 body so a Prometheus scraper that breaks after upgrade has a clear migration path. No behavior change.	2026-05-08 18:09:14 -07:00
ryan-crabbe-berri	13a193367f	feat(sso): show full IdP claims in /sso/debug/callback (#27498 ) * feat(sso): show full IdP claims in /sso/debug/callback The debug callback only displayed the proxy-parsed OpenID summary, so customers couldn't verify what custom claims (team_id, team_alias, roles, etc.) the IdP was actually returning. Render two new sections — Raw Claims (userinfo) and Access Token Claims (decoded JWT) — alongside the existing parsed view. Strip bearer tokens defense-in-depth in case a non-conforming IdP places them in its userinfo response. Resolves LIT-2838 * Update litellm/proxy/management_endpoints/ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(sso): hoist json.dumps out of f-string for py3.10 ruff --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-05-08 17:39:39 -07:00
oss-agent-shin	f2e97380d2	Add OpenRouter Qwen 3.6 Plus metadata (#27486 ) Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-08 16:25:45 -07:00
oss-agent-shin	ae67cecc22	Allow team admins to test model connections (#27487 ) Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-08 15:30:41 -07:00
Parijat Sharma	144279eb57	fix(ui): URL-encode team_id in teamInfoCall to handle special characters (#27466 )	2026-05-08 10:46:46 -07:00
yuneng-jiang	98cd057f38	Merge pull request #27437 from BerriAI/litellm_merge_main_staging chore: merge main into internal_staging to restore lineage	2026-05-07 17:57:05 -07:00
Yuneng Jiang	faba1fda8a	Merge remote-tracking branch 'origin/main' into litellm_merge_main_staging Restore lineage between main and internal staging so the next staging->main promotion (#27436) can merge without conflicts. main was 2 commits ahead: - `6ff668c7aa` squash-merge of the previous staging->main promotion (#27245) - `8c9830eef9` feat(xai): add grok-4.3 (#27396), already present on staging The squash-merge has no shared lineage with the individual commits that went into staging, which is why git surfaced 13 textual conflicts despite both sides having the same logical content. Every conflicting file's main-side change came from `6ff668c7aa` only, and the matching staging-side changes are the post-promotion evolution. Resolved all 13 with --ours (staging's version is the latest evolution; main's snapshot is stale). The grok-4.3 entries auto-merged in model_prices_and_context_window.json and its backup, but were already on staging via an independent commit, so the net diff vs HEAD is empty for those files. Net new content from this merge: 12 lines added to ui/litellm-dashboard/package-lock.json -- npm 11 libc array tags on four existing entries, no functional impact.	2026-05-07 17:49:48 -07:00
yuneng-jiang	ee8d8c4137	Merge pull request #27431 from BerriAI/yj_bump_may7 [Infra] Bump versions	2026-05-07 17:40:22 -07:00
Yuneng Jiang	1f1963f1d0	Merge remote-tracking branch 'origin' into yj_bump_may7	2026-05-07 17:34:13 -07:00
yuneng-jiang	a6195cc7d7	Merge pull request #27433 from BerriAI/litellm_yj_may7 [Infra] Merge dev branch	2026-05-07 17:33:42 -07:00
Yuneng Jiang	5e2c283604	fix(docker): restore npm@11.14.0 lost in merge resolution Merge of cve-sweep-2026-05 into litellm_yj_may7 picked the older npm@11.12.1 line, regressing the bump in `f08b1b63fa` that cleared ip-address GHSA-v2v4-37r5-5v8g (npm@11.12.1 bundles ip-address@10.1.0; 11.14.0 bundles 10.1.1).	2026-05-07 17:25:10 -07:00
yuneng-jiang	5082f9bc71	Merge pull request #27225 from stuxf/cve-sweep-2026-05 [Security] Clear AWS Inspector CVE findings on Docker image	2026-05-07 17:20:11 -07:00
yuneng-jiang	309cc36f9d	Merge branch 'litellm_yj_may7' into cve-sweep-2026-05	2026-05-07 17:19:37 -07:00
yuneng-jiang	a20c020101	Merge pull request #27432 from BerriAI/litellm_/elegant-mestorf-549737 test(interactions): align openapi compliance with upstream rename outputs->steps	2026-05-07 17:18:27 -07:00
yuneng-jiang	d351abd76b	Merge pull request #27430 from BerriAI/litellm_fix/remove-separate-health-app fix: remove separate health app	2026-05-07 17:13:52 -07:00
Yuneng Jiang	3d67f00ede	test(interactions): align openapi compliance with upstream rename outputs->steps Google updated their Interactions OpenAPI spec (https://ai.google.dev/static/api/interactions.openapi.json), removing the readOnly 'outputs' property from CreateModelInteractionParams in favor of 'steps' (a polymorphic transcript array). The compliance test fetches the live spec, so it began failing on every PR once the spec flipped over. Update the asserted output-field list to match. Note: this only re-aligns the spec-shape assertion. Our SDK response types (litellm/types/interactions/generated.py) still expose 'outputs' and need to be regenerated separately to add 'steps'/Step variants and decide on a back-compat path for callers reading .outputs.	2026-05-07 17:09:25 -07:00
user	f74fd4382c	Drop liccheck black allowlist entry The entry was added to cover the now-reverted black 24.10.0 -> 26.3.1 bump. With the bump dropped, upstream's existing liccheck setup is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 00:04:11 +00:00
user	9a2aeb0a41	Drop diskcache changes from PR Revert disk_cache.py JSONDisk swap + remove test_disk_cache.py. The JSONDisk migration is backwards-incompatible (existing pickle caches become unreadable; non-JSON values raise unguarded TypeError on set) and warrants its own focused PR with a feature flag rather than riding along with the CVE/dep-bump sweep. CVE-2025-69872 remains unmitigated at the diskcache layer; users concerned about pickle-RCE on cache-dir writers can avoid Cache(type="disk") or pin a fork until upstream ships a fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 00:03:05 +00:00
user	f08b1b63fa	Bump npm 11.12.1 -> 11.14.0 to clear ip-address GHSA-v2v4-37r5-5v8g Empirical grype scan of the built runtime image flagged ip-address@10.1.0 (Medium) bundled inside /usr/local/lib/node_modules/npm. npm@11.14.0 bundles ip-address@10.1.1 which carries the fix. Verified by rebuilding the image and rescanning: ip-address finding gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 23:56:43 +00:00
user	4d6e2bc3da	Bump wolfi-base to latest (2026-05-05) Pin the new digest published 2026-05-05: sha256:31da6565... (from sha256:3258be47...). Delta in baseline packages: zlib 1.3.2-r2 -> 1.3.2-r3. glibc stays at 2.43-r7 (still the latest available; whatever further glibc fixes for CVE-2026-5450 / CVE-2026-5928 land in -r8+ from Chainguard, this PR doesn't touch those). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 23:36:04 +00:00
Yuneng Jiang	086a23753e	uv lock	2026-05-07 16:30:15 -07:00
Yuneng Jiang	44aecb6f66	bump: version 1.84.0 → 1.85.0	2026-05-07 16:28:33 -07:00
yuneng-jiang	40a490aed7	Merge pull request #27241 from BerriAI/litellm_/nifty-kilby-82870d [Infra] Packaging: Relax Core Runtime Pins To Ranges	2026-05-07 16:15:57 -07:00
Yuneng Jiang	9ae9b81c1b	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/nifty-kilby-82870d # Conflicts: # uv.lock	2026-05-07 16:10:22 -07:00
Yassin Kortam	451ce161fc	fix: remove separate health app	2026-05-07 16:04:56 -07:00
user	5bafa8b3a2	Drop dep bumps + black-26 reformat to clear fork CI policy PR was blocked by .github/workflows/guard-fork-dependencies.yml: fork PRs cannot modify uv.lock. Reverting: - uv.lock + pyproject.toml black bump (24.10.0 -> 26.3.1) and the 295 files of mechanical Black 26 reformat coupled to it - pyproject.toml diskcache extra change (kept the runtime mitigation in litellm/caching/disk_cache.py via JSONDisk) Kept: - Dockerfile cache narrowing (drops ~660 MB of uv build cache that surfaced cached setuptools as CVE findings) - litellm/caching/disk_cache.py: dc.JSONDisk to neutralize CVE-2025-69872 - ui/litellm-dashboard/package-lock.json + litellm-js/spend-logs/package-lock.json: next/postcss/hono/uuid CVE bumps (these are not blocked by the fork guard) - tests/test_litellm/caching/test_disk_cache.py - tests/code_coverage_tests/liccheck.ini: harmless black authorization Black + gitpython + langchain dep upgrades will need a follow-up from a maintainer pushing a branch in the canonical BerriAI/litellm repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 23:04:52 +00:00
user	63bda3f001	Merge remote-tracking branch 'upstream/litellm_internal_staging' into cve-sweep-2026-05 # Conflicts: # uv.lock	2026-05-07 23:03:28 +00:00
yuneng-jiang	0fb88d50dd	Merge pull request #27415 from BerriAI/litellm_/sweet-mcclintock-2b3656 [Fix] Realtime Tests: Update Deprecated OpenAI Model Pin	2026-05-07 15:46:51 -07:00
Yuneng Jiang	a43dc9f0b1	[Fix] Batches Tests: Remove VCR Auto-Marker Strip VCR wiring from the batches test conftest. Drops: - import of `_vcr_conftest_common` helpers - the `vcr_config` fixture, `pytest_recording_configure`, `_vcr_outcome_gate`, `pytest_runtest_makereport` - the `apply_vcr_auto_marker_to_items` call in `pytest_collection_modifyitems` - `VerboseReporterState` / its `pytest_configure` / `pytest_runtest_logreport` hooks (purely VCR-verdict plumbing) Why: every test in this directory creates ephemeral OpenAI / Bedrock / vLLM resources whose IDs change per run (file-XXX, batch-XXX, ft-XXX, ...). VCR's path/query/body matchers don't match across runs, so `record_mode="new_episodes"` was silently passing through to the live API and recording many new cassette entries every run. Cassette bloat without replay benefit. Behaviour after this change is identical to running the directory without `CASSETTE_REDIS_URL` set: tests that have keys hit live APIs, tests that don't continue to skip via their existing skipif markers. Conftest now keeps only path setup and the session-scoped `event_loop` fixture.	2026-05-07 15:39:46 -07:00
ishaan-berri	e4c14862fc	feat(mcp): add OBO MCP Auth (#27421 ) * feat(mcp): add oauth2 token exchange auth Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * fix(mcp): cache token exchange fallback Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> --------- Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-07 15:35:21 -07:00
Yuneng Jiang	5256a1fdb6	[Fix] Fine-Tuning Test: Bump Off Deprecated gpt-3.5-turbo-0125 OpenAI announced gpt-3.5-turbo-0125 (and fine-tuning of gpt-3.5-turbo in general) for shutdown on 2026-10-23, with the announcement landing 2026-04-22. The hard-fail date is ~5 months out, but timing fits the recent uptick in this test flaking and OpenAI may already be running the deprecated model's pipeline with deprioritized infra. Bump to gpt-4o-mini-2024-07-18 — currently supported for fine-tuning, no announced shutdown. Updates the live test plus the mocked test for consistency. Belt-and-suspenders with the existing propagation-retry helper.	2026-05-07 14:41:58 -07:00
Yuneng Jiang	a8cad84dc7	[Fix] Fine-Tuning Test: Retry on File Propagation 400 Previous fix polled `litellm.afile_retrieve` for `status == "processed"` before calling the fine-tuning endpoint. That doesn't actually solve the race: - OpenAI's `FileObject.status` field is deprecated per the SDK type and not authoritative — it can read "processed" before the file is usable. - The retrieve and fine-tuning endpoints don't share a consistency model, so retrieve succeeding tells you nothing about FT visibility. Replace with a retry around the actual `acreate_fine_tuning_job` call that catches the OpenAI 400 `'file-... does not exist'` and backs off exponentially (1s → cap 8s, 12 attempts, ~70s total budget). The operation succeeding is the only reliable signal that propagation finished.	2026-05-07 14:17:45 -07:00
Yuneng Jiang	a64716ed5b	[Fix] Fine-Tuning Test: Wait for OpenAI File Propagation OpenAI file uploads are eventually consistent — a freshly uploaded file may briefly 404 from `retrieve` and is rejected by the fine-tuning endpoint with `'file-... does not exist'` until processing finishes. The async fine-tuning test called `acreate_fine_tuning_job` immediately after `acreate_file` and flaked on this race. Add a polling helper that waits up to ~30s for `status=processed` (and short-circuits on `error`), called between upload and FT job creation. Mirrors the same propagation lag covered by the `await asyncio.sleep(1)` in the sister batches test, but more robust against longer delays.	2026-05-07 14:06:04 -07:00
Yuneng Jiang	4189d78a64	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/sweet-mcclintock-2b3656	2026-05-07 13:46:53 -07:00
Yuneng Jiang	9e7dc5ef68	[Fix] Realtime Tests: Update Deprecated OpenAI Model Pin OpenAI deprecated the gpt-4o-realtime-preview-2024-10-01 snapshot, which caused these E2E tests to fail consistently in CI. Bump to the unversioned gpt-4o-realtime-preview alias to match the sibling test_openai_realtime_simple.py and stay current as OpenAI rolls the alias forward.	2026-05-07 13:46:45 -07:00
michelligabriele	3b78a3a545	fix(chat-completions): decode unified file_id when model_file_id_mapping is unavailable (#27406 ) * fix(chat-completions): decode unified file_id when model_file_id_mapping is unavailable * fix(chat-completions): tolerate non-dict content items (e.g. token-ids from text_completion)	2026-05-07 13:17:04 -07:00
ishaan-berri	b891a201f8	Preserve LiteLLM headers for passthrough responses (#27412 ) Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-07 12:59:36 -07:00
Shivam Rawat	29e4eb16da	Merge pull request #27222 from BerriAI/litellm_s3AuditParams [Feat] Decouple S3 audit-log config via s3_audit_callback_params	2026-05-07 12:49:03 -07:00
yuneng-jiang	b9b315157b	Merge pull request #27409 from BerriAI/litellm_/inspiring-allen-ec64a4 [Fix] Tests: Reduce VCR cassette bloat and fix multipart caching	2026-05-07 12:39:58 -07:00
Michael-RZ-Berri	db8198faba	[Fix] Allow non-admin compliance path reads (#27234 ) * allow non-admin roles on /compliance/* read routes * Restrict compliance routes to internal users --------- Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-05-07 14:07:23 -05:00
Yuneng Jiang	2f9519d286	[Fix] Tests: Reduce VCR cassette bloat and fix multipart caching - Add `_strip_image_b64_payloads` filter: rewrites `data[*].b64_json` in image-gen responses to a 4-byte placeholder before the cassette is saved. Image-edit and image-gen cassettes (193 MB / 184 MB / 104 MB / ...) will shrink to <100 KB on next record. Tests assert response shape only, so coverage is preserved. - Add `_normalize_multipart_boundary` filter: replaces httpx's per-request random multipart boundary with a fixed string in both Content-Type header and body bytes. Audio-transcription / Whisper tests have been effectively unmocked — every CI run hit live providers and was silently capped at MAX_EPISODES_PER_CASSETTE=50. Both record and replay now see identical bytes; the safe_body matcher works. - Fix test_evals_api.py body poisoning: replace `int(time.time())` in eval names with `hashlib.sha1(test_node_name)[:12]`, add a function-scoped `managed_eval` fixture that creates and deletes the eval, and switch `get_eval` / `update_eval` from `list_evals().data[0].id` (which made the URL vary by run) to `managed_eval.id`. Net coverage gain: delete is now actually exercised. - Swap arxiv PDF URL in BaseOCRTest for the in-repo `dummy.pdf` (589 B) served via sha-pinned jsdelivr. - Swap etsystatic image URL in BaseLLMChatTest.test_image_url for the in-repo LiteLLM logo (9.2 KB) served via the same jsdelivr pin. - Add `tests/llm_translation/test_vcr_filters.py` with 14 unit tests covering both new filters: replacement, idempotency, nesting, content- length update, two-distinct-boundaries-converge-after-normalize, etc. Cassettes recorded with the prior patterns will mismatch on the first CI run after merge; recommend flushing the cassette Redis once (post-merge) so re-records save under the new format from the start.	2026-05-07 11:54:19 -07:00
michelligabriele	9f1b41d206	fix(proxy): run model-level post_call guardrails on streaming requests (#26922 )	2026-05-07 11:53:03 -07:00
Cursor Agent	492118de25	Fix OTEL false content capture mode	2026-05-07 18:12:23 +00:00
Michael Riad Zaky	581ae1443d	honor OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT	2026-05-07 09:44:39 -07:00
Mateo Wang	8c9830eef9	feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_conte… (#27154 ) (#27396 ) * feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_context_window.json xAI's docs page now lists grok-4.3 as the recommended chat / coding model: "We strongly recommend all API callers use grok-4.3. It is the most intelligent and fastest model we've built." (https://docs.x.ai/docs/models) Pricing/specs sourced from xAI's published model metadata: - input: $1.25 / 1M tokens (<=200k), $2.50 / 1M tokens (>200k) - output: $2.50 / 1M tokens (<=200k), $5.00 / 1M tokens (>200k) - cached: $0.20 / 1M tokens (<=200k), $0.40 / 1M tokens (>200k) - context: 1,000,000 tokens - capabilities: vision, reasoning, function calling, structured outputs, prompt caching, web search Adds two entries: `xai/grok-4.3` (canonical) and `xai/grok-4.3-latest` (alias), mirroring the pattern used for the rest of the xAI/Grok-4 family. * test(xai): add model_info test for grok-4.3 + sync backup cost map - Mirror xai/grok-4.3 and xai/grok-4.3-latest entries into litellm/model_prices_and_context_window_backup.json so the bundled model cost map matches the canonical model_prices_and_context_window.json. - Add tests/test_litellm/test_xai_grok_4_3_model_metadata.py covering pricing tiers, capability flags, context window, provider routing, and parity between the main and backup cost maps. - Point 'source' at the live xAI models page (the per-model URL https://docs.x.ai/docs/models/grok-4.3 currently 404s). --------- Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com> Co-authored-by: shin-watcher <shin-watcher@berri.ai> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-07 09:39:26 -07:00
ishaan-berri	fee5900acc	feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_conte… (#27154 ) * feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_context_window.json xAI's docs page now lists grok-4.3 as the recommended chat / coding model: "We strongly recommend all API callers use grok-4.3. It is the most intelligent and fastest model we've built." (https://docs.x.ai/docs/models) Pricing/specs sourced from xAI's published model metadata: - input: $1.25 / 1M tokens (<=200k), $2.50 / 1M tokens (>200k) - output: $2.50 / 1M tokens (<=200k), $5.00 / 1M tokens (>200k) - cached: $0.20 / 1M tokens (<=200k), $0.40 / 1M tokens (>200k) - context: 1,000,000 tokens - capabilities: vision, reasoning, function calling, structured outputs, prompt caching, web search Adds two entries: `xai/grok-4.3` (canonical) and `xai/grok-4.3-latest` (alias), mirroring the pattern used for the rest of the xAI/Grok-4 family. * test(xai): add model_info test for grok-4.3 + sync backup cost map - Mirror xai/grok-4.3 and xai/grok-4.3-latest entries into litellm/model_prices_and_context_window_backup.json so the bundled model cost map matches the canonical model_prices_and_context_window.json. - Add tests/test_litellm/test_xai_grok_4_3_model_metadata.py covering pricing tiers, capability flags, context window, provider routing, and parity between the main and backup cost maps. - Point 'source' at the live xAI models page (the per-model URL https://docs.x.ai/docs/models/grok-4.3 currently 404s). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> --------- Co-authored-by: shin-watcher <shin-watcher@berri.ai> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-07 09:06:56 -07:00
harish-berri	a67b7a7e87	Refactor Bedrock response stream shape handling (#27257 ) * Refactor Bedrock response stream shape handling - Introduced a module-level constant `BEDROCK_RESPONSE_STREAM_SHAPE` to cache the response stream shape, eliminating the need for per-instance caching in `BedrockEventStreamDecoderBase`. - Updated relevant methods to utilize the new constant, improving performance by avoiding redundant loading of the shape. - Added tests to ensure the shape is loaded correctly at import time and is consistent across different modules. - Added a new mock server script for testing Bedrock pass-through functionality. * Refactor response parsing for Bedrock and SageMaker - Improved code readability by formatting the parsing method calls in `AWSEventStreamDecoder` for both Bedrock and SageMaker response stream shapes. - Added blank lines for better separation of code blocks in `invoke_handler.py` and `common_utils.py` to enhance maintainability. * Enhance error handling for Bedrock and SageMaker response stream shape loading - Wrapped the loading logic in `_load_bedrock_response_stream_shape` and `_load_sagemaker_response_stream_shape` with try-except blocks to gracefully handle exceptions. - Added logging to warn when the response stream shape cannot be pre-loaded, ensuring the module imports cleanly. - Updated tests to verify that loading failures return `None` instead of propagating exceptions. * Implement error handling for missing response stream shapes in Bedrock and SageMaker - Added checks in `_parse_message_from_event` methods to raise appropriate errors when `BEDROCK_RESPONSE_STREAM_SHAPE` or `SAGEMAKER_RESPONSE_STREAM_SHAPE` is None, ensuring clearer error reporting. - Updated logging messages to reflect the unavailability of event-stream decoding for both Bedrock and SageMaker. - Enhanced unit tests to verify that the correct exceptions are raised when the response stream shapes are not loaded.	2026-05-06 17:39:38 -07:00

1 2 3 4 5 ...

39037 Commits