litellm

Author	SHA1	Message	Date
user	5bafa8b3a2	Drop dep bumps + black-26 reformat to clear fork CI policy PR was blocked by .github/workflows/guard-fork-dependencies.yml: fork PRs cannot modify uv.lock. Reverting: - uv.lock + pyproject.toml black bump (24.10.0 -> 26.3.1) and the 295 files of mechanical Black 26 reformat coupled to it - pyproject.toml diskcache extra change (kept the runtime mitigation in litellm/caching/disk_cache.py via JSONDisk) Kept: - Dockerfile cache narrowing (drops ~660 MB of uv build cache that surfaced cached setuptools as CVE findings) - litellm/caching/disk_cache.py: dc.JSONDisk to neutralize CVE-2025-69872 - ui/litellm-dashboard/package-lock.json + litellm-js/spend-logs/package-lock.json: next/postcss/hono/uuid CVE bumps (these are not blocked by the fork guard) - tests/test_litellm/caching/test_disk_cache.py - tests/code_coverage_tests/liccheck.ini: harmless black authorization Black + gitpython + langchain dep upgrades will need a follow-up from a maintainer pushing a branch in the canonical BerriAI/litellm repo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 23:04:52 +00:00
user	63bda3f001	Merge remote-tracking branch 'upstream/litellm_internal_staging' into cve-sweep-2026-05 # Conflicts: # uv.lock	2026-05-07 23:03:28 +00:00
yuneng-jiang	0fb88d50dd	Merge pull request #27415 from BerriAI/litellm_/sweet-mcclintock-2b3656 [Fix] Realtime Tests: Update Deprecated OpenAI Model Pin	2026-05-07 15:46:51 -07:00
Yuneng Jiang	a43dc9f0b1	[Fix] Batches Tests: Remove VCR Auto-Marker Strip VCR wiring from the batches test conftest. Drops: - import of `_vcr_conftest_common` helpers - the `vcr_config` fixture, `pytest_recording_configure`, `_vcr_outcome_gate`, `pytest_runtest_makereport` - the `apply_vcr_auto_marker_to_items` call in `pytest_collection_modifyitems` - `VerboseReporterState` / its `pytest_configure` / `pytest_runtest_logreport` hooks (purely VCR-verdict plumbing) Why: every test in this directory creates ephemeral OpenAI / Bedrock / vLLM resources whose IDs change per run (file-XXX, batch-XXX, ft-XXX, ...). VCR's path/query/body matchers don't match across runs, so `record_mode="new_episodes"` was silently passing through to the live API and recording many new cassette entries every run. Cassette bloat without replay benefit. Behaviour after this change is identical to running the directory without `CASSETTE_REDIS_URL` set: tests that have keys hit live APIs, tests that don't continue to skip via their existing skipif markers. Conftest now keeps only path setup and the session-scoped `event_loop` fixture.	2026-05-07 15:39:46 -07:00
ishaan-berri	e4c14862fc	feat(mcp): add OBO MCP Auth (#27421 ) * feat(mcp): add oauth2 token exchange auth Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * fix(mcp): cache token exchange fallback Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> --------- Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-07 15:35:21 -07:00
Yuneng Jiang	5256a1fdb6	[Fix] Fine-Tuning Test: Bump Off Deprecated gpt-3.5-turbo-0125 OpenAI announced gpt-3.5-turbo-0125 (and fine-tuning of gpt-3.5-turbo in general) for shutdown on 2026-10-23, with the announcement landing 2026-04-22. The hard-fail date is ~5 months out, but timing fits the recent uptick in this test flaking and OpenAI may already be running the deprecated model's pipeline with deprioritized infra. Bump to gpt-4o-mini-2024-07-18 — currently supported for fine-tuning, no announced shutdown. Updates the live test plus the mocked test for consistency. Belt-and-suspenders with the existing propagation-retry helper.	2026-05-07 14:41:58 -07:00
Yuneng Jiang	a8cad84dc7	[Fix] Fine-Tuning Test: Retry on File Propagation 400 Previous fix polled `litellm.afile_retrieve` for `status == "processed"` before calling the fine-tuning endpoint. That doesn't actually solve the race: - OpenAI's `FileObject.status` field is deprecated per the SDK type and not authoritative — it can read "processed" before the file is usable. - The retrieve and fine-tuning endpoints don't share a consistency model, so retrieve succeeding tells you nothing about FT visibility. Replace with a retry around the actual `acreate_fine_tuning_job` call that catches the OpenAI 400 `'file-... does not exist'` and backs off exponentially (1s → cap 8s, 12 attempts, ~70s total budget). The operation succeeding is the only reliable signal that propagation finished.	2026-05-07 14:17:45 -07:00
Yuneng Jiang	a64716ed5b	[Fix] Fine-Tuning Test: Wait for OpenAI File Propagation OpenAI file uploads are eventually consistent — a freshly uploaded file may briefly 404 from `retrieve` and is rejected by the fine-tuning endpoint with `'file-... does not exist'` until processing finishes. The async fine-tuning test called `acreate_fine_tuning_job` immediately after `acreate_file` and flaked on this race. Add a polling helper that waits up to ~30s for `status=processed` (and short-circuits on `error`), called between upload and FT job creation. Mirrors the same propagation lag covered by the `await asyncio.sleep(1)` in the sister batches test, but more robust against longer delays.	2026-05-07 14:06:04 -07:00
Yuneng Jiang	4189d78a64	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/sweet-mcclintock-2b3656	2026-05-07 13:46:53 -07:00
Yuneng Jiang	9e7dc5ef68	[Fix] Realtime Tests: Update Deprecated OpenAI Model Pin OpenAI deprecated the gpt-4o-realtime-preview-2024-10-01 snapshot, which caused these E2E tests to fail consistently in CI. Bump to the unversioned gpt-4o-realtime-preview alias to match the sibling test_openai_realtime_simple.py and stay current as OpenAI rolls the alias forward.	2026-05-07 13:46:45 -07:00
michelligabriele	3b78a3a545	fix(chat-completions): decode unified file_id when model_file_id_mapping is unavailable (#27406 ) * fix(chat-completions): decode unified file_id when model_file_id_mapping is unavailable * fix(chat-completions): tolerate non-dict content items (e.g. token-ids from text_completion)	2026-05-07 13:17:04 -07:00
ishaan-berri	b891a201f8	Preserve LiteLLM headers for passthrough responses (#27412 ) Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-07 12:59:36 -07:00
Shivam Rawat	29e4eb16da	Merge pull request #27222 from BerriAI/litellm_s3AuditParams [Feat] Decouple S3 audit-log config via s3_audit_callback_params	2026-05-07 12:49:03 -07:00
yuneng-jiang	b9b315157b	Merge pull request #27409 from BerriAI/litellm_/inspiring-allen-ec64a4 [Fix] Tests: Reduce VCR cassette bloat and fix multipart caching	2026-05-07 12:39:58 -07:00
Michael-RZ-Berri	db8198faba	[Fix] Allow non-admin compliance path reads (#27234 ) * allow non-admin roles on /compliance/* read routes * Restrict compliance routes to internal users --------- Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-05-07 14:07:23 -05:00
Yuneng Jiang	2f9519d286	[Fix] Tests: Reduce VCR cassette bloat and fix multipart caching - Add `_strip_image_b64_payloads` filter: rewrites `data[*].b64_json` in image-gen responses to a 4-byte placeholder before the cassette is saved. Image-edit and image-gen cassettes (193 MB / 184 MB / 104 MB / ...) will shrink to <100 KB on next record. Tests assert response shape only, so coverage is preserved. - Add `_normalize_multipart_boundary` filter: replaces httpx's per-request random multipart boundary with a fixed string in both Content-Type header and body bytes. Audio-transcription / Whisper tests have been effectively unmocked — every CI run hit live providers and was silently capped at MAX_EPISODES_PER_CASSETTE=50. Both record and replay now see identical bytes; the safe_body matcher works. - Fix test_evals_api.py body poisoning: replace `int(time.time())` in eval names with `hashlib.sha1(test_node_name)[:12]`, add a function-scoped `managed_eval` fixture that creates and deletes the eval, and switch `get_eval` / `update_eval` from `list_evals().data[0].id` (which made the URL vary by run) to `managed_eval.id`. Net coverage gain: delete is now actually exercised. - Swap arxiv PDF URL in BaseOCRTest for the in-repo `dummy.pdf` (589 B) served via sha-pinned jsdelivr. - Swap etsystatic image URL in BaseLLMChatTest.test_image_url for the in-repo LiteLLM logo (9.2 KB) served via the same jsdelivr pin. - Add `tests/llm_translation/test_vcr_filters.py` with 14 unit tests covering both new filters: replacement, idempotency, nesting, content- length update, two-distinct-boundaries-converge-after-normalize, etc. Cassettes recorded with the prior patterns will mismatch on the first CI run after merge; recommend flushing the cassette Redis once (post-merge) so re-records save under the new format from the start.	2026-05-07 11:54:19 -07:00
michelligabriele	9f1b41d206	fix(proxy): run model-level post_call guardrails on streaming requests (#26922 )	2026-05-07 11:53:03 -07:00
ishaan-berri	fee5900acc	feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_conte… (#27154 ) * feat(xai): add grok-4.3 and grok-4.3-latest to model_prices_and_context_window.json xAI's docs page now lists grok-4.3 as the recommended chat / coding model: "We strongly recommend all API callers use grok-4.3. It is the most intelligent and fastest model we've built." (https://docs.x.ai/docs/models) Pricing/specs sourced from xAI's published model metadata: - input: $1.25 / 1M tokens (<=200k), $2.50 / 1M tokens (>200k) - output: $2.50 / 1M tokens (<=200k), $5.00 / 1M tokens (>200k) - cached: $0.20 / 1M tokens (<=200k), $0.40 / 1M tokens (>200k) - context: 1,000,000 tokens - capabilities: vision, reasoning, function calling, structured outputs, prompt caching, web search Adds two entries: `xai/grok-4.3` (canonical) and `xai/grok-4.3-latest` (alias), mirroring the pattern used for the rest of the xAI/Grok-4 family. * test(xai): add model_info test for grok-4.3 + sync backup cost map - Mirror xai/grok-4.3 and xai/grok-4.3-latest entries into litellm/model_prices_and_context_window_backup.json so the bundled model cost map matches the canonical model_prices_and_context_window.json. - Add tests/test_litellm/test_xai_grok_4_3_model_metadata.py covering pricing tiers, capability flags, context window, provider routing, and parity between the main and backup cost maps. - Point 'source' at the live xAI models page (the per-model URL https://docs.x.ai/docs/models/grok-4.3 currently 404s). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> --------- Co-authored-by: shin-watcher <shin-watcher@berri.ai> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-07 09:06:56 -07:00
harish-berri	a67b7a7e87	Refactor Bedrock response stream shape handling (#27257 ) * Refactor Bedrock response stream shape handling - Introduced a module-level constant `BEDROCK_RESPONSE_STREAM_SHAPE` to cache the response stream shape, eliminating the need for per-instance caching in `BedrockEventStreamDecoderBase`. - Updated relevant methods to utilize the new constant, improving performance by avoiding redundant loading of the shape. - Added tests to ensure the shape is loaded correctly at import time and is consistent across different modules. - Added a new mock server script for testing Bedrock pass-through functionality. * Refactor response parsing for Bedrock and SageMaker - Improved code readability by formatting the parsing method calls in `AWSEventStreamDecoder` for both Bedrock and SageMaker response stream shapes. - Added blank lines for better separation of code blocks in `invoke_handler.py` and `common_utils.py` to enhance maintainability. * Enhance error handling for Bedrock and SageMaker response stream shape loading - Wrapped the loading logic in `_load_bedrock_response_stream_shape` and `_load_sagemaker_response_stream_shape` with try-except blocks to gracefully handle exceptions. - Added logging to warn when the response stream shape cannot be pre-loaded, ensuring the module imports cleanly. - Updated tests to verify that loading failures return `None` instead of propagating exceptions. * Implement error handling for missing response stream shapes in Bedrock and SageMaker - Added checks in `_parse_message_from_event` methods to raise appropriate errors when `BEDROCK_RESPONSE_STREAM_SHAPE` or `SAGEMAKER_RESPONSE_STREAM_SHAPE` is None, ensuring clearer error reporting. - Updated logging messages to reflect the unavailability of event-stream decoding for both Bedrock and SageMaker. - Enhanced unit tests to verify that the correct exceptions are raised when the response stream shapes are not loaded.	2026-05-06 17:39:38 -07:00
ishaan-berri	854456f58e	Fix Prometheus remaining metric zero values (#27348 ) Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-06 17:22:20 -07:00
yuneng-jiang	f1c91d754d	[Chore] CI: Block PRs that drop overall code coverage (#27340 ) * [Chore] CI: Block PRs that drop overall code coverage Tighten Codecov project status threshold from 1% to 0% so any drop in overall project coverage relative to the base commit fails the codecov/project check. target: auto keeps the bar floating with the codebase, no manual maintenance needed as coverage moves up over time. * [Chore] CI: Always post Codecov status regardless of CI outcome Set codecov.require_ci_to_pass: false and codecov.notify.wait_for_ci: false so Codecov posts the codecov/project and codecov/patch checks as soon as the expected uploads arrive, instead of withholding them when unrelated CI jobs fail. The coverage-regression check is independent of test pass/fail, and CI failures are already enforced by their own required-status checks.	2026-05-06 16:41:50 -07:00
yuneng-jiang	a3a42c6c47	[Chore] CI: Assign test_request_size_limit_middleware To Proxy-Runtime Shard (#27341 ) The assert-shard-coverage guard in test-unit-proxy-db.yml failed because test_request_size_limit_middleware.py was added under tests/proxy_unit_tests/ but not referenced by any matrix entry. Assigning it to the proxy-runtime shard, which already covers other server-runtime tests (proxy_routes, proxy_gunicorn, server_root_path).	2026-05-06 16:34:45 -07:00
oss-agent-shin	b318231fe9	Add Azure Sentinel audit log support (#27280 ) * Add Azure Sentinel audit log callback support Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * Fix Azure Sentinel audit log batching Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * Fix Azure Sentinel CI checks Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> --------- Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-06 15:50:06 -07:00
ishaan-berri	aba131d3cf	fix: Vertex Anthropic streaming status error hangs (#27310 ) * Fix streaming HTTP status error hangs Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * Fix sync streaming HTTP status error hangs Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * Cap sync streaming error read workers Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> --------- Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-06 15:32:55 -07:00
ishaan-berri	c15718f9d1	Fix Anthropic streaming reasoning token usage (#27319 ) * fix anthropic streaming reasoning token usage Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * test anthropic streaming reasoning usage end to end Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * address anthropic reasoning token text split Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * harden anthropic reasoning usage for mocked tokens Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> --------- Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-06 15:28:22 -07:00
ishaan-berri	bd1a05aed9	Fix MCP DB reload partial failures (#27314 ) * Fix MCP database reload partial failures Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * Avoid staged MCP registry exposure Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> --------- Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-06 15:18:18 -07:00
ishaan-berri	924c141843	Add new chat model metadata (#27313 ) * add new model metadata Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * address review feedback Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> --------- Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-06 15:15:21 -07:00
ishaan-berri	487479eff7	perf: cap Prometheus end-user metric cardinality with TTL + LRU eviction (#27272 ) Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-06 13:35:13 -07:00
oss-agent-shin	c8e47dcb43	Fix early proxy request size enforcement (#27311 ) * Add early proxy request size guard Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * Address request size review feedback Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> --------- Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-06 12:29:11 -07:00
Dibyo Mukherjee	169c436684	Fix/member access group team (#27317 ) * fix(auth): pass team_id in member-level model access check _check_team_member_model_access calls _can_object_call_model without team_id, so access groups defined via model_info.access_groups cannot resolve for team-scoped DB models (their internal router name is model_name_<team>_<uuid>, not the public name). The team-level check already passes team_id; this mirrors that. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(auth): add tests for member-level access group resolution with team_id Eight tests covering _can_object_call_model and _check_team_member_model_access with team-scoped DB models: - access group resolves when team_id is passed - access group fails without team_id (pre-fix behavior) - literal model name still works with team_id (no regression) - denied model still denied with team_id - second model in group also reachable - end-to-end member access via access group (mocked membership) - end-to-end member denied for model not in allowed list - no-override member inherits team-level check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-06 12:05:22 -07:00
oss-agent-shin	d90cf56245	Fix SCIM user lookup filters (#27308 ) * Fix SCIM Okta userName lookup Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * fix scim user filter typing Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> --------- Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-06 11:58:47 -07:00
ishaan-berri	c92a08a307	Fix team member budget enforcement without user row (#27273 ) * Fix team member budget enforcement without user row Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * Clarify regenerated key budget repro Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> --------- Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-06 11:42:29 -07:00
Yassin Kortam	b1f577199a	fix(proxy): keep spend log cleanup running after batch failures and surface DB errors (#27303 ) Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-06 18:39:15 +00:00
Mateo Wang	b83d11351f	proxy: hot-reload config YAML when --reload is set (#27274 ) * proxy: hot-reload config YAML when --reload is set Uvicorn's --reload only watches .py by default, so editing the --config YAML did not restart the proxy. _get_reload_options() now extends reload_dirs/reload_includes with the config file's directory and basename when --config is provided. proxy: qualify reload_includes with absolute config path Address Greptile review on PR #27274. When the --config file lives outside cwd, reload_includes previously stored only the basename, which meant uvicorn/watchfiles would also reload on edits to any same-named file inside cwd. Use the absolute config path as the include pattern in that case so only the actual proxy config triggers a restart. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(proxy): use basename for reload_includes config pattern Uvicorn's resolve_reload_patterns() calls pathlib.Path.glob(), which raises NotImplementedError on absolute patterns (uvicorn discussion 2156). Passing config_abs (an absolute path) when the config file lived outside cwd crashed startup under --reload. The config_dir is already added to reload_dirs, so using just the basename as the include pattern is sufficient to match the specific config file. * fix: make it reload app when yaml changes * style: remove unneeded comments --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-06 16:06:58 +00:00
Yassin Kortam	bd1ea0252a	perf(proxy): run daily activity aggregation off the event loop (#27264 ) Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-05 20:19:28 -07:00
ishaan-berri	c32ad90823	Fix Prometheus custom metadata label counts (#27268 ) (#27271 ) * Fix Prometheus custom metadata label counts (#27268) Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * fix enterprise test: update positional label assertions to keyword args prometheus_label_factory now calls .labels() with keyword arguments. Update test_async_log_failure_event assertion to match. --------- Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-05 20:04:56 -07:00
ishaan-berri	e9fb29061a	Include model name + configured TPM/RPM in priority rate-limit 429 er… (#27216 ) * Include model name + configured TPM/RPM in priority rate-limit 429 errors (#27215) * Include model name + configured TPM/RPM in priority rate-limit 429 errors The current 429 message ('Priority-based rate limit exceeded. Priority: prod, Rate limit type: tokens, Remaining: -664145, Model saturation: 86.3%') doesn't tell the operator which model was hit or what the configured limit is, so they can't tell whether the priority allocation needs tuning or the model TPM is just too small. Add Model, Model TPM, and Model RPM to both the priority-based 429 and the sibling Model-capacity 429 in dynamic_rate_limiter_v3._check_rate_limits. Pure error-message change — no behavior or schema impact. * test: assert priority 429 includes model name + configured TPM/RPM Adds a regression test for the new fields in the priority-based 429 detail ('Model:', 'Model TPM:', 'Model RPM:'). Verified locally that the test fails against the unpatched dynamic_rate_limiter_v3.py and passes after the patch. --------- Co-authored-by: shin-watcher <ext-agent-shin@berri.ai> * Update litellm/proxy/hooks/dynamic_rate_limiter_v3.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update litellm/proxy/hooks/dynamic_rate_limiter_v3.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: shin-watcher <ext-agent-shin@berri.ai> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-05-05 19:05:22 -07:00
Dennis Henry	73de892654	fix: replace user api key auth with authorization or cookie for mcp server creation (#27190 ) * fix: replace user api key auth with authorization or cookie for mcp server creation * updated tests	2026-05-05 18:36:22 -07:00
Michael-RZ-Berri	e75c7a312a	union x-litellm-tags with static team/key tags (#27247 ) Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>	2026-05-05 17:46:42 -07:00
Mateo Wang	fdaa288607	ci(circleci): enable Rerun Failed Tests for all pytest jobs (#27155 ) * ci(circleci): enable Rerun Failed Tests for all pytest suites Migrated every pytest-based CircleCI job that uploads JUnit results to use 'circleci tests run' instead of invoking pytest directly. This is the prerequisite for CircleCI's 'Rerun failed tests' feature to be available on each job in the pipeline. For each job: - Glob test files via 'circleci tests glob' and pipe them into 'circleci tests run --command="xargs ... pytest ..."' so the agent can feed the failed-test subset on rerun. - Preserve all original pytest flags (parallelism, timeouts, retries, coverage, junit output paths). - For jobs that previously lacked 'store_test_results' (proxy spend accuracy, proxy_build_from_pip, db_migration_disable_update_check), add the step so JUnit XML is uploaded and rerun is actually wired up. - Replace the dynamic IGNORE_DIRS shell array in llm_translation_testing with a 'grep -v' filter on the glob output, matching the previous behavior of skipping tests/llm_translation/realtime. - For 'build_and_test', glob 'tests/test_.py' (top-level only) which matches the prior 'tests/.py' shell glob; the long list of '--ignore=tests/<subdir>' flags was vestigial and is dropped. Jobs already using 'circleci tests run' (local_testing_part1/2, litellm_router_testing) are unchanged. * fix(ci): convert classnames to file paths on rerun CircleCI's Rerun Failed Tests sends each previously failed test as a JUnit classname (e.g. 'tests.otel_tests.test_key_logging_callbacks'), but pytest needs a file path. Without the awk preprocess step, rerun runs fail with 'file or directory not found'. Mirror the awk transform that local_testing_part1, local_testing_part2, and litellm_router_testing already use, so rerun works in every job that this PR migrated to 'circleci tests run'. * ci: drop -x from OTEL pytest run so all failures are reported --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-05-05 17:27:09 -07:00
Sameer Kankute	fd7ff0f269	fix(hosted_vllm): normalize custom tools for chat completions (#25763 ) * fix(hosted_vllm): normalize custom tools for chat completions Convert custom tool definitions into OpenAI function tools before forwarding hosted_vllm chat requests to avoid provider-side validation failures. Add a regression test and include a local curl verification screenshot. Made-with: Cursor * Fix black issue * Fix hosted vllm custom tool schema fallback * fix black --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-05-05 17:27:02 -07:00
yuneng-jiang	9a338e1b6b	[Test] Tests: Stop parametrizing API keys into pytest test IDs (#27249 ) Several tests parametrized over (model, api_key, ...) tuples or raw token strings, causing pytest to embed those values in the test ID and print them in CI logs. Refactored each affected test to keep the same coverage without putting key material into parametrize. - audio_tests/test_audio_speech.py: split env-var keys into separate azure/openai test functions sharing a helper; sync_mode parametrize preserved. - audio_tests/test_whisper.py: split into openai_whisper / azure_whisper functions sharing a helper; response_format parametrize preserved. - local_testing/test_embedding.py: single-case parametrize inlined. - proxy_unit_tests/test_user_api_key_auth.py: 5 header parametrize cases split into 5 named tests sharing an _assert helper. - proxy_unit_tests/test_proxy_utils.py: 4 api_key_value cases split into 4 named tests. - test_litellm/proxy/auth/test_user_api_key_auth.py: 5 key-prefix cases (Bearer / Basic / lowercase bearer / raw / AWS SigV4) split into 5 named tests. Verified: black clean; 14 refactored unit tests pass; pytest collects audio/embedding tests with safe IDs (no key material in test IDs).	2026-05-05 17:21:18 -07:00
Sameer Kankute	e912e6d4ff	feat(audio_transcription): add NVIDIA Riva STT provider (#27185 ) * feat(audio_transcription): add NVIDIA Riva STT provider Adds nvidia_riva as a new audio transcription provider, supporting both NVCF-hosted and self-hosted Riva ASR deployments via gRPC streaming. - Auto-resamples input audio to 16 kHz mono LINEAR_PCM (soundfile + numpy, audioread fallback) so callers can send any common format. - Maps OpenAI params: language (en -> en-US), response_format (text/json/ verbose_json), timestamp_granularities=["word"] -> enable_word_time_offsets, word offsets converted ms -> s for verbose_json. - Auth: NVCF when nvcf_function_id is set (SSL on by default), self-hosted otherwise (SSL off by default), with explicit use_ssl override. - gRPC errors wrapped via NvidiaRivaException -> litellm exception classes. - Optional deps gated behind [stt-nvidia-riva] extra (nvidia-riva-client, soundfile, audioread, numpy). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(nvidia_riva): address PR review feedback - handler: forward call-level `timeout` to streaming_response_generator (kwarg-detected via inspect for older riva-client compat) so a stalled Riva server cannot block the caller indefinitely. - audio_utils: spill bytes to a tempfile before audioread.audio_open; most audioread backends (FFmpeg, GStreamer) require a real filesystem path and previously raised TypeError on BytesIO, breaking the mp3/m4a fallback path. - audio_utils: prefer soxr / scipy.signal.resample_poly for resampling (anti-aliased polyphase) when installed, falling back to linear only as a last resort. Avoids aliasing on 44.1/48 kHz -> 16 kHz downsamples. - transformation: bare `es` now maps to es-ES (Castilian) instead of es-US, matching BCP-47 conventions. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: trigger CI re-run [stabilize loop 1/3] * Update litellm/llms/nvidia_riva/audio_transcription/transformation.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * chore: trigger CI re-run [stabilize loop 1/3] * fix code qa * fix lint * fix mypy * fix mypy * Fix NVIDIA Riva ASR service lookup * Fix NVIDIA Riva transcription payload logging --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: oss-pr-review-agent-shin[bot] <281797381+oss-pr-review-agent-shin[bot]@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>	2026-05-05 17:17:51 -07:00
Krrish Dholakia	454ce5073f	fix(anthropic, mcp): sanitize tool names to match Anthropic's [a-zA-Z0-9_-]{1,128} pattern (#26788 ) * fix(anthropic, mcp): sanitize tool names to match Anthropic's `^[a-zA-Z0-9_-]{1,128}$` Tool names with characters like `/` or `.` (commonly produced by the OpenAPI -> MCP generator from `operationId`s such as `actions/download-job-logs-for-workflow-run`) caused Anthropic to reject requests with `tools.N.custom.name: String should match pattern '^[a-zA-Z0-9_-]{1,128}$'`. Two layers of fix: 1. Anthropic transformation: build a per-request forward map (original -> sanitized, disambiguated by suffix on collisions) and a reverse map (only for names actually rewritten). Forward map is applied to tool defs, `tool_choice`, and historical assistant tool_calls in messages. Reverse map is threaded through both the non-streaming and streaming response paths so callers continue to see their original tool names in `tool_use` blocks. 2. OpenAPI -> MCP generator: sanitize `operationId` (and the method+path fallback) at registration time so generated MCP tools are valid for any strict-name provider, not just Anthropic. The dashboard preview endpoint applies the same sanitization for parity. Includes unit tests covering: collision disambiguation between `foo_bar` and `foo/bar` in the same request, reverse-map only firing for actually-rewritten names, message rewrite for historical tool_calls, streaming chunk_parser reverse-mapping, and sanitization of OpenAPI operationIds plus the preview endpoint output. Made-with: Cursor * fix(anthropic): build tool-name maps in transform_request, not optional_params The previous patch stashed the per-request forward and reverse tool-name maps under ``optional_params["_anthropic_tool_name_forward_map"]`` and ``optional_params["_anthropic_tool_name_map"]``. ``optional_params`` is the dict that becomes the JSON body via ``data = {*optional_params}``, so those internal keys leaked over the wire and Anthropic 400'd with: _anthropic_tool_name_forward_map: Extra inputs are not permitted Worse, this meant every* request whose tool list contained any name with an invalid character (the exact case the patch was meant to fix) regressed into a confusing meta-error pointing at LiteLLM's internal map instead of the offending tool. Fix: move all tool-name sanitization into ``transform_request``, which is the single chokepoint already shared by ``AnthropicConfig``, ``AmazonAnthropicConfig`` (Bedrock invoke), ``VertexAIAnthropicConfig``, and ``AzureAnthropicConfig`` (all call ``super().transform_request`` / ``AnthropicConfig.transform_request(self, ...)``). New static helper ``_sanitize_tool_names_in_request`` walks the already-Anthropic-shaped ``optional_params["tools"]`` (only ``type=="custom"`` entries -- hosted tool names are reserved by Anthropic and must not be touched), builds the per-request forward/reverse maps, and applies the forward map in place to ``tools[].name`` and ``tool_choice.name``. The reverse map is stashed exclusively on ``litellm_params`` (which is never serialized to a provider) under ``_anthropic_tool_name_map`` for the response paths to consume. Side effect of this restructure: ``map_openai_params`` is now a pure OpenAI->Anthropic param translator with no side-channel state, which matches its contract everywhere else in the codebase. Tests: replaced the now-incorrect "stashes maps in optional_params" tests with regressions that assert no underscore-prefixed keys appear in either ``optional_params`` after ``map_openai_params`` or in the final ``transform_request`` body. Added end-to-end coverage for: sanitization in ``transform_request``, ``tool_choice`` rewriting, historical ``tool_calls`` rewriting in messages, and hosted-tool passthrough. Made-with: Cursor fix(anthropic): always sanitize empty text content blocks Anthropic 400s on `{"role": "user", "content": ""}` with: "messages: text content blocks must be non-empty" LiteLLM already had `_sanitize_empty_text_content` to rewrite empty text to a placeholder, but it was gated behind `litellm.modify_params=True`. With that flag off (default), empty content from upstream agent frameworks (e.g. pydantic-ai) flowed straight through and tripped the Anthropic validator. Fix: - Always run `_sanitize_empty_text_content` at the top of `anthropic_messages_pt`, independent of `modify_params`. There is no way to "pass through" an empty text block, so this is non-optional. The richer tool-call sanitizations (Cases A/B/D, which actually mutate conversation structure) remain gated on `modify_params`. - Extend `_sanitize_empty_text_content` to also handle list-of-blocks content (`[{"type": "text", "text": ""}]`), not just string content. Adds 3 regression tests covering string content, list-of-blocks content, and the no-op case (non-empty messages with modify_params off). Made-with: Cursor * fix(anthropic): drop dead tool-name forward-map params, fix mypy + caller-mutation - remove unused `name_forward_map` param from `_map_tool_choice`, `_map_tool_helper`, `_map_tools` and the `_apply_anthropic_tool_name_forward` helper. Production sanitization runs in `_sanitize_tool_names_in_request` at `transform_request`; these params were never threaded through. - handler.py: use `ANTHROPIC_TOOL_NAME_REVERSE_MAP_KEY` constant instead of the hardcoded `"_anthropic_tool_name_map"` string. - fix mypy `"object" has no attribute "__iter__"` in `_rewrite_tool_names_in_messages` by guarding `tool_calls` with `isinstance(..., list)`. - `_sanitize_tool_names_in_request`: build a new tools list with copy-on- change entries (and copy `tool_choice` on rewrite) so a caller reusing the same tool list/dicts across requests doesn't see its inputs permanently rewritten. - doc-comment `_build_request_tool_name_maps` clarifying it operates on OpenAI-format tools (vs `_sanitize_tool_names_in_request` which runs on Anthropic-format tools post-`_map_tools`). - tests: drop 3 tests pinning the now-removed param paths; add coverage for tool_calls + None function_call rewrite and caller-dict immutability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(mcp): inherit stored credentials in test/tools/list for edit flow When editing an existing MCP server, the Tool Configuration preview calls POST /mcp-rest/test/tools/list with server_id but no credentials (management API redacts them). The endpoint now calls _inherit_credentials_from_existing_server() so stored bearer tokens and OAuth2 M2M credentials are loaded from global_mcp_server_manager automatically — tools load without re-entering credentials. New servers (no server_id) and requests with explicit credentials are unaffected (function is a no-op in both cases). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(mcp): show all tools in edit panel, not just allowed tools Edit flow was passing externalTools (from GET /tools/list, filtered by allowed_tools) to MCPToolConfiguration, disabling the internal hook. Remove the external props so the internal hook fires via POST /test/tools/list, which returns all tools unfiltered. Combined with the credential inheritance fix, tools load automatically without re-entering credentials and all tools are visible for re-configuration. existingAllowedTools still pre-checks previously allowed tools. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix order-dependent collision in _build_anthropic_tool_name_maps Use a two-pass approach: first pre-register all already-valid tool names in the 'used' set, then sanitize/disambiguate names that need rewriting. This ensures valid names always have priority regardless of input order, preventing duplicate tool names on the wire when e.g. 'foo/bar' appears before 'foo_bar' in the tool list. Add regression test for the reversed ordering case. * Fix OpenAPI tool name collision: disambiguate sanitized names with numeric suffixes sanitize_openapi_tool_name replaces all invalid chars with '_', but when two operationIds differ only by sanitized characters (e.g. 'foo/list' and 'foo.list' both become 'foo_list'), the second registration silently overwrites the first in the tool registry. Add collision disambiguation in register_tools_from_openapi that appends _2, _3, ... suffixes when a sanitized name is already taken, mirroring the existing logic in _build_anthropic_tool_name_maps. * Fix preview endpoint missing collision disambiguation for tool names Add used_names tracking and _2/_3 suffix disambiguation to _preview_openapi_tools, matching the logic in register_tools_from_openapi. Without this, two operationIds that sanitize to the same string (e.g. 'foo/list' and 'foo.list' both becoming 'foo_list') would show duplicate names in the preview while registration would disambiguate them. * Align preview HTTP method order with register_tools_from_openapi The preview endpoint and register_tools_from_openapi both use order-dependent collision disambiguation (_2, _3 suffixes). When the iteration order differs, two operations on the same path with sanitized names that collide get different suffixes in preview vs registration, so the dashboard shows names that don't match what actually got registered. Also adds a regression test that fails on the swapped order. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * Skip duplicate originals in _build_anthropic_tool_name_maps If the same invalid tool name appeared twice in original_names (e.g. ['foo/bar', 'foo/bar']), the second occurrence overwrote the forward map entry with a freshly-suffixed name (foo_bar_2), leaving foo_bar orphaned in 'used' with no reverse mapping. _sanitize_tool_names_in_request then rewrote both tool entries to foo_bar_2, and Anthropic 400'd on duplicate tool names. Skip the rewrite if forward already has the original mapped. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-06 00:00:36 +00:00
Yassin Kortam	dbc8f5a937	helm: skip proxy startup prisma db push when migrations Job is enabled (#27200 ) Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-05 16:58:53 -07:00
Yassin Kortam	618df94433	helm: increase default probe timeouts, disable debug logging by default (#27237 ) Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-05 16:58:34 -07:00
Yassin Kortam	950074eea2	fix: atomic TPM rate limit (#27001 ) Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-05 16:58:07 -07:00
Sameer Kankute	b8635bbc7a	feat(realtime): OpenAI Realtime GA support and beta compatibility (#27110 ) * feat(realtime): OpenAI Realtime GA support and beta compatibility - Normalize beta-style session.update to GA for upstream OpenAI; optional GA→beta event translation when client sends OpenAI-Beta: realtime=v1 - Default upstream WebSocket without OpenAI-Beta; forward header when client opts in - Extend OpenAI realtime types for GA event names and conversation item shapes - Relax LiteLLMRealtimeStreamLoggingObject.results to List[Any] for GA events - Update proxy client_secrets fallback to omit beta header; dashboard RealtimePlayground - Add unit tests for remap, translation, and beta header helper Co-authored-by: Cursor <cursoragent@cursor.com> * fix results * fix greptile * Fix mypy issues * Remove unused class constants _GA_TEXT_DELTA_TYPES and _GA_AUDIO_DELTA_TYPES These frozensets were defined as class-level constants in realtime_streaming.py but never referenced anywhere in the codebase. Removing dead code. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix(realtime): use GA-shaped session.update in guardrail injections The guardrail VAD injection code sent a beta-style session.update with a flat turn_detection field: {"session": {"turn_detection": {"create_response": false}}} When the upstream OpenAI backend operates in GA mode (no OpenAI-Beta header forwarded), it requires the nested GA shape: {"session": {"type": "realtime", "audio": {"input": {"turn_detection": {"create_response": false}}}}} The _remap_beta_session_to_ga helper was only applied to client- originated session.update messages in client_ack_messages. Internally- generated session.updates (sent via _send_to_backend) in two paths: - _handle_raw_backend_message (raw/no provider_config path, line 518) - backend_to_client_send_messages provider_config path (line 481) bypassed the remap, so GA upstreams ignored or rejected them, breaking audio transcription guardrails for all non-beta clients. Fix: add _make_disable_auto_response_message() helper that always emits the correct GA-shaped session.update, and replace both injection sites with it. Update existing tests to assert the GA nested shape instead of the old flat beta shape, and add a new unit test for the helper itself. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * Log realtime session type * Fix beta realtime session payloads * Fix realtime audio format remapping edge case * Fix Azure realtime beta session shape --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>	2026-05-05 16:49:20 -07:00
harish-berri	4fec69dd1e	refactor(BaseAWSLLM): implement shared IAM cache and static credentia… (#27125 ) * refactor(BaseAWSLLM): implement shared IAM cache and static credential caching - Introduced a process-wide shared IAM cache to optimize credential management across instances. - Added a method to handle caching of static credentials, ensuring only long-lived credentials are cached. - Updated the get_credentials method to utilize the new caching mechanism for static credential flows. - Enhanced unit tests to verify the correct behavior of the shared cache and static credential usage. * refactor(BaseAWSLLM): enhance IAM credential caching and update related tests - Improved the process-wide IAM credential caching mechanism to better handle static and AssumeRole credentials. - Renamed the caching method for clarity and updated comments to reflect the new caching behavior. - Added a fixture to ensure the IAM cache is flushed between tests to prevent leakage of cached entries. - Updated unit tests to verify the correct behavior of the shared IAM cache, particularly for static credentials and role assumptions. * refactor(BaseAWSLLM): clarify IAM credential caching behavior and enhance tests - Updated documentation to specify that only static and ambient environment credentials are cached, excluding AssumeRole and other credential types. - Modified the caching logic to ensure that AssumeRole credentials are not stored in the IAM cache, requiring STS calls for each request. - Enhanced unit tests to verify that AssumeRole credentials are not cached and to ensure proper behavior of the IAM cache across different scenarios. * Code Readability improvement for aws auth path * refactor(BaseAWSLLM): enhance IAM credential caching documentation and add tests - Updated comments to clarify the behavior of the in-process IAM credential cache, specifying the TTL for static and ambient credentials. - Added new unit tests to verify the caching behavior for ambient environment credentials across instances and ensure that static access key sessions are constructed only once when cached. - Ensured that temporary session tokens and AWS profiles are not cached, validating the expected behavior through additional tests. * refactor(BaseAWSLLM): improve IAM credential handling and add tests for role assumption - Updated comments to clarify the behavior of IAM credential caching, particularly regarding the handling of ambient credentials and role assumptions. - Enhanced unit tests to verify that the caching mechanism correctly distinguishes between already running roles and new role assumptions, ensuring that cached environment credentials are not reused incorrectly. - Added a new test case to validate the behavior when switching roles, confirming that the system correctly uses AssumeRole when the role changes.	2026-05-05 16:47:47 -07:00
yuneng-jiang	e84282b7b3	[Infra] Bump deps (#27157 ) * bump: version 0.4.70 → 0.4.71 * bump: version 0.1.39 → 0.1.40 * uv lock	2026-05-05 15:58:05 -07:00

1 2 3 4 5 ...

39002 Commits