litellm

Author	SHA1	Message	Date
Mateo Wang	33c363d4d4	Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic (#29847 ) * test(ci): extend record/replay proxy to chat, embeddings, moderations, rerank, anthropic The record/replay proxy that took the gpt-image-1 spend E2E off the live OpenAI path now fronts every provider, so the other real-provider E2Es stop paying for and depending on live calls each commit. It keys per upstream and selects a non-OpenAI provider by a /__recorder_upstream/<host>/ path prefix carried on the model's api_base, since some litellm handlers (cohere rerank) drop custom request headers. Wired into build_and_test (chat, embeddings, moderations, image), the otel job (cohere rerank), and the anthropic-messages job via a reusable start_openai_record_replay_proxy command. Dropped the time.time()/uuid prompt cache-busters in the build_and_test chat tests, whose config has the response cache off, so identical requests are recordable. The image spend test now asserts a repeat call still bills spend, failing loudly if the proxy response cache is ever turned on. Responses, the anthropic passthrough, bedrock, and fake-endpoint tests are left live: their lifecycles, api_base assertions, providers, or fake targets make a stateless body-keyed cache either break them or add nothing. * docs(ci): note the recorder command's OpenAI default upstream and prefix override Addresses a review note: the shared start_openai_record_replay_proxy command defaults the upstream to OpenAI, so a non-OpenAI model must carry the /__recorder_upstream/<host>/ prefix on its api_base. Document that in the command description so a future caller does not assume the default follows the provider.	2026-06-06 14:33:42 -07:00
Mateo Wang	84247d954d	test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound (#29787 ) * test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound The dockerized spend test test_key_info_spend_values_image_generation curls the proxy for a gpt-image-1 image, which wildcard-routes to real api.openai.com on every commit; an OpenAI outage then reddens unrelated PRs and each run pays for an image. Add an in-repo record/replay reverse proxy (tests/_openai_record_replay_proxy.py) that sits between the proxy and OpenAI. The first run, and the first after the recording lapses, records live; subsequent runs replay from the shared Redis cassette store. The proxy keeps its real separate-process HTTP topology; only the image model's api_base is pointed at the recorder in CI via IMAGE_GEN_RECORDER_BASE_URL, which is unset elsewhere so it falls back to api.openai.com. Recordings lapse 24h after write and are never refreshed on read, matching the VCR persister contract, so provider drift is still caught. Replayed responses drop upstream framing/server headers (content-length, transfer-encoding, content-encoding, date, server) so the re-serving layer recomputes them, honoring the Bedrock content-length lesson. * test(ci): close recorder http client on app shutdown Add a Starlette lifespan that closes the self-created httpx.AsyncClient on teardown, and leave caller-injected clients untouched so reuse across create_app calls is not broken. Covers the unclosed-client ResourceWarning raised in review.	2026-06-05 10:27:23 -07:00
ryan-crabbe-berri	770fff7058	test(proxy): stop running real-DB tests in GitHub Actions unit jobs (#29700 ) * test(proxy): stop running real-DB tests in GitHub Actions unit jobs GitHub Actions unit jobs were spinning up a Postgres service container, but the only active tests that touched it either used the DB incidentally (a cargo-culted prisma_client.connect()) or were genuine integration tests mislabeled as unit. Mock the incidental ones so the proxy-db job needs no container, and move the tests that genuinely need a database (proxy management behavior, master-key-not-persisted, schema-migration sync) to CircleCI, which is already the real-infrastructure lane. * test(proxy): restore no-unexpected-startup-writes canary in master-key test Greptile noted the hash-match assertion no longer catches other unexpected startup writes (a default key, a rotation artifact). The CircleCI job gives each run a fresh DB, so a clean startup must leave the table empty; add that canary back alongside the precise master-key assertion.	2026-06-04 14:56:02 -07:00
Mateo Wang	84969aaf15	fix(ci): keep coverage rename green when a parallel node runs no tests (#29608 ) * fix(ci): keep coverage rename green when a parallel node runs no tests local_testing_part1 and local_testing_part2 run with parallelism 4. When CircleCI reruns only the failed tests, the failed test lands on a single node and the other nodes receive an empty bucket, so pytest never writes coverage.xml or .coverage. The unguarded "mv coverage.xml ..." then exits 1 and turns the whole job red even though the rerun passed; the next persist_to_workspace step would fail the same way on the missing paths. Guard the rename so a node with no coverage emits empty placeholders instead. coverage combine tolerates the empty files, so the downstream upload-coverage job keeps the real nodes' data intact. * fix(ci): pre-create test-results in litellm_router_testing for empty-bucket reruns litellm_router_testing also runs with parallelism 4. On a rerun of only the failed tests, a node can receive no tests, so the test command never creates test-results and the final store_test_results step can fail on the missing path. Pre-create the directory up front, matching what local_testing_part1 and part2 already do and CircleCI's own guidance for parallel reruns. * test(openai): retry wildcard chat completion on transient OpenAI 500 build_and_test reddened on test_openai_wildcard_chat_completion when the real gpt-3.5-turbo-0125 call returned an OpenAI 500 ("The server had an error while processing your request"). The base branch passed the same call concurrently, so the 500 is an intermittent OpenAI server error, not a regression. Add the same pytest-retry marker the sibling real-call tests in this file already use so a transient upstream 500 no longer fails CI.	2026-06-03 13:37:53 -07:00
yuneng-jiang	34293fa80a	ci: reproduce default-Windows wheel install to guard MAX_PATH (#29597 ) * ci: reproduce default-Windows wheel install to guard MAX_PATH The existing using_litellm_on_windows job installs the project with `uv sync`, an editable source install that never copies package files into a deep site-packages path, so it cannot see the 260-char MAX_PATH overflow that breaks `pip install litellm` on default Windows. The content-filter benchmark fixtures have hit that limit three times (#21941, #22039, #29536), each caught only after release. This adds a guard to the same job that builds the wheel and installs it the way an end user would: into a venv whose site-packages prefix is padded to a realistic worst-case Windows length (~100 chars), then asserts the install completes and litellm imports. Any packaged path long enough to bust MAX_PATH at that prefix is reported up front, so the check is deterministic regardless of the runner's long-path setting, while the real install also covers failure modes a length heuristic cannot (half-unpacked packages, reserved names, case collisions). This commit is the guard only; on the current tree it correctly fails because nine fixtures still exceed the limit. The rename that brings them back under it follows on this branch. * fix(packaging): shorten content-filter benchmark fixtures under MAX_PATH The 10 content-filter benchmark result fixtures used the legacy block_{topic}_-_contentfilter_({yaml}).json naming, up to 176 chars inside the wheel, which busts the Windows 260-char MAX_PATH limit once extracted under a realistic site-packages prefix and aborts `pip install litellm` on default Windows. Rename them to the short {topic}_cf.json scheme that _save_confusion_results already emits today (it splits the label on the em-dash and writes f"{topic}_cf"), matching the insults_cf.json and investment_cf.json files fixed earlier. Re-running the eval suite now regenerates these same short names rather than recreating the long ones. This drops the longest packaged path from 176 to 128, so the guard added in the previous commit goes from red to green with a 32-char margin. * test(windows): tidy MAX_PATH guard per review Close the wheel zip via a context manager rather than leaning on refcount collection, and select the wheel under dist/ by newest mtime so a stale artifact from an earlier build cannot be tested instead of the one just produced. Also pin down the venv-depth formula with a short note: the +2 is the separator joining the venv root to "Lib" plus the trailing separator before the entry, which lands the simulated site-packages prefix at exactly 100 chars.	2026-06-03 11:28:08 -07:00
Mateo Wang	f48a87ef12	fix(ci): normalize whitespace before classname-to-path awk on test rerun (#29475 )	2026-06-01 22:39:13 -07:00
ryan-crabbe-berri	a9cc6ed68c	test(e2e): cover PROXY_LOGOUT_URL redirect on Logout (#29080 ) * test(e2e): cover PROXY_LOGOUT_URL redirect on Logout Env-gated spec mirroring the existing serverRootPathRedirect pattern: when the proxy is booted with PROXY_LOGOUT_URL set, clicking Logout in the navbar must navigate to that external URL. The standard run_e2e.sh exports an empty value so the rest of the suite is unaffected; this spec self-skips unless the env var is populated. * test(e2e): run PROXY_LOGOUT_URL spec in the suite + harden logout assertions Boot the e2e proxy with PROXY_LOGOUT_URL set (job-level env in CircleCI and run_e2e.sh) so proxyLogoutUrl.spec.ts actually runs instead of self-skipping. Nothing else in the suite performs a logout, so this only affects the behavior under test. Harden the spec to verify the logout flow rather than a URL substring: - wait for /sso/get/ui_settings before clicking so logoutUrl is populated (otherwise window.location.href = "" silently reloads same-origin) - assert a token cookie exists first, and is cleared after logout - locate the dropdown via getByRole instead of internal antd CSS classes - stub the external destination and assert on URL origin + path prefix * test(e2e): assert exact PROXY_LOGOUT_URL on logout redirect Replace the origin + startsWith(pathname) checks with a single normalized href comparison. With PROXY_LOGOUT_URL=https://www.example.com the path was "/", so startsWith("/") matched any path and left path/query/hash unchecked. Comparing normalized hrefs pins scheme, host, port, path, query and hash while still tolerating the browser's trailing-slash/default-port normalization.	2026-05-30 18:19:04 -07:00
Sameer Kankute	f35e7eb2f6	feat(guardrails): add Microsoft Purview DLP guardrail (#24966 ) * feat(guardrails): add Microsoft Purview DLP guardrail * fix(guardrails/purview): raise_for_status on HTTP errors, cap scope cache, reuse executor * fix(guardrails/purview): propagate litellm_call_id as correlation_id to Purview * chore: fixes * refactor(guardrails): delegate get_user_prompt to get_last_user_message PurviewGuardrailBase duplicated AzureGuardrailBase (and OpenAIGuardrailBase) user-prompt extraction. The same logic already lived in common_utils.get_last_user_message; wire guardrail bases to that helper, fix the helper docstring, and drop its redundant self-import of convert_content_list_to_str. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix(purview): make protection scope cache true LRU on hits OrderedDict.get() does not update insertion order; call move_to_end on TTL-valid cache hits so popitem(last=False) evicts least-recently-used users instead of FIFO by first insert. Add a regression test with a small max cache size. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * Fix mypy * fix(guardrails/purview): harden user-id resolution and broaden DLP text Prefer API key and proxy-injected metadata over client metadata for Entra identity. Scan full message transcript pre-call and all completion choices post-call. Align logging-only hook with the same user-id rules. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(guardrails/purview): scan /v1/completions prompt and TextChoices Normalize text-completion prompts (string or list of strings); skip token-id-only prompts. Run post-call DLP on TextCompletionResponse choices. Extend logging_only hook for text_completion. Add tests and completion_prompt_to_str helper. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(purview-dlp): return data after DLP pass; per-call executor; dedupe text extraction async_pre_call_hook now returns the request dict after a successful check so callers match skip-path behavior. logging_hook uses a fresh ThreadPoolExecutor per invocation like Presidio to avoid single-worker starvation. Response text extraction is centralized in _completion_response_text_parts. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix(purview): fix LRU cache refresh position and add Responses API scanning Two fixes to the Microsoft Purview DLP guardrail: 1. LRU cache bug (base.py): When a stale scope cache entry was re-fetched, the assignment updated the value but Python's OrderedDict.__setitem__ preserves the original insertion order for existing keys. This left the refreshed entry near the front of the dict, making it the first candidate for LRU eviction via popitem(last=False). Fix: call move_to_end(user_id) after every write to an existing key. 2. Responses API coverage gap (purview_dlp.py): Requests to /v1/responses use an 'input' field instead of 'messages' or 'prompt', so the pre-call hook returned without scanning the content. Similarly, post-call hook did not handle ResponsesAPIResponse.output. Fix: add _responses_api_input_to_str() helper and handle 'responses'/'aresponses' call types in async_pre_call_hook, async_post_call_success_hook (via _completion_response_text_parts), and async_logging_hook. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix(purview): message separator, non-blocking logging_hook, TextChoices type error Three bugs fixed in the Microsoft Purview DLP guardrail: 1. get_prompt_text_for_dlp message separator (base.py) - Previously called get_str_from_messages() which concatenated all message texts with NO separator, so 'end of msg1' + 'start of msg2' became 'end of msg1start of msg2'. - Now joins per-message text with '\n\n' via convert_content_list_to_str(), preserving DLP pattern detection accuracy across message boundaries. 2. logging_hook blocking the event loop thread (purview_dlp.py) - Previously called future.result() which blocked the calling thread (often the event loop thread) for the entire round-trip of two sequential Microsoft Graph API calls (_compute_protection_scopes + _process_content). - Now fires and forgets: when called inside a running loop, schedules the coroutine with loop.create_task(); otherwise spawns a daemon thread. Returns (kwargs, result) immediately in both cases. - Removes unused concurrent.futures.ThreadPoolExecutor import; adds threading. 3. Incompatible assignment type error (purview_dlp.py:180) - mypy inferred 'choice' as TextChoices from the first loop body, then flagged the assignment in the second loop as incompatible with Choices. - Fixed by using distinct loop variable names: text_choice (TextChoices) and chat_choice (Choices). Tests: 7 new tests added covering the separator fix (TestGetPromptTextForDlp) and the non-blocking logging_hook (TestLoggingHookNonBlocking). Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix(purview): suppress API errors in logging-only mode and scan tool-call arguments Three issues fixed: 1. _check_content except block re-raised unconditionally even when block_on_violation=False. The docstring promised 'log only - do not raise' but network/API errors always propagated. Fixed by checking block_on_violation before re-raising; when False, log a warning and continue. 2. async_logging_hook used a single try/except wrapping both the prompt and response audit calls. When the first _check_content (uploadText) raised due to an API error the second call (downloadText) was silently skipped. Fixed by giving each audit call its own try/except so both always run independently. 3. convert_content_list_to_str() only reads message.content, so tool_calls[].function.arguments and function_call.arguments were invisible to the Purview pre-call and post-call scans. An authenticated caller could embed sensitive text in tool-call arguments and bypass DLP. Fixed by: - Adding PurviewGuardrailBase._extract_tool_call_args_from_message() which handles both dict and object-style messages, covering both tool_calls[] arrays and the legacy function_call field. - Updating get_prompt_text_for_dlp() to include those arguments alongside message content (request/prompt path). - Changing _completion_response_text_parts() from @staticmethod to an instance method and adding tool-call argument extraction for ModelResponse choices (response path). Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * chore(ui): restructure pre-built Next.js output to directory-based routing Flat page files (e.g. guardrails.html) replaced by directory-based index.html equivalents (e.g. guardrails/index.html) matching the Next.js App Router output format. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix(purview): comprehensive security hardening — identity spoofing, streaming bypass, token-id gap Four security issues addressed: 1. end_user_id kwargs fallback missing in _resolve_user_id_from_logging_kwargs user_id already fell back to kwargs.get("user_api_key_user_id") when absent from metadata, but end_user_id only checked md.get("user_api_key_end_user_id") with no kwargs-level fallback. Added or kwargs.get("user_api_key_end_user_id"). 2. Streaming responses bypassed post_call blocking async_post_call_success_hook only runs on assembled non-streaming responses. For streaming requests the proxy already delivered all content before the hook ran, so raising HTTPException there had no effect. Added async_post_call_streaming_iterator_hook which buffers the entire stream, assembles it via stream_chunk_builder, runs the Purview DLP check, and only then re-yields chunks via MockResponseIterator. If a violation is detected the exception is raised before any bytes reach the client. The proxy automatically skips async_post_call_success_hook for guardrails that define this method, preventing duplicate scans. 3. Caller-controlled Purview user identity in blocking modes When a LiteLLM API key has no bound user_id the guardrail fell back to metadata[user_id_field], which is supplied by the caller. A caller could set this to any Entra object ID whose Purview policies are more permissive and bypass DLP. Added _resolve_trusted_user_id() that only returns identities from the proxy auth system (user_api_key_dict.user_id, end_user_id, or proxy-injected metadata["user_api_key_user_id"]). Added _resolve_user_id_for_blocking() used by all blocking-mode hooks: tries trusted sources first; if only caller-supplied is available, logs a SECURITY WARNING and still proceeds (backward compat); if nothing resolves, skips with a warning. 4. Token-id prompt DLP bypass When /v1/completions received a pure token-id array prompt, completion_prompt_to_str() returned None and the pre_call hook silently skipped the Purview scan. An authenticated caller could tokenize blocked text and send it without DLP evaluation. The hook now detects this case (raw_prompt present but prompt_text None) and logs a WARNING while letting the request pass through — token-id payloads are opaque at the text layer and cannot be scanned. This makes the gap explicit rather than silent. Tests: 94 total, all passing. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * Revert "chore(ui): restructure pre-built Next.js output to directory-based routing" This reverts commit c70c4303b735bb3885732bd4a0e01997e9571f56. * fix(purview): fail closed on identity spoofing, token prompts, and path encoding Encode Entra user IDs in Graph paths, guard caches with asyncio.Lock, scan Responses API instructions with string input, reject caller-only metadata and token-id completion prompts in blocking mode, and revert unrelated UI HTML restructure from the PR branch. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(purview): use threading.Lock and getattr for LitellmParams - Replace asyncio.Lock with threading.Lock in PurviewGuardrailBase. The cache lock is acquired both from the proxy's main event loop and from short-lived event loops created by the logging_hook thread fallback. In Python 3.10+ an asyncio.Lock is bound to the first event loop that acquires it, so the second loop would silently break audit logging with RuntimeError. All critical sections are in-memory dict ops with no awaits, so a synchronous lock is safe. - Use getattr() on LitellmParams in initialize_guardrail() instead of .get(), which does not exist on Pydantic BaseModel instances and would raise AttributeError at runtime. Tests updated to construct Mock objects with spec= so they reflect the real interface. Co-authored-by: Yassin Kortam <yassin@berri.ai> * refactor(purview): dedupe trust-level user resolution and drop dead code - _resolve_user_id now delegates levels 1-3 to _resolve_trusted_user_id so blocking and non-blocking paths share a single source of truth. - Drop redundant event_hook override in MicrosoftPurviewDLPGuardrail.__init__ (initialize_guardrail already forwards event_hook=litellm_params.mode). - Drop unused self._logging_only attribute; blocking is controlled by the block_on_violation argument passed to _check_content. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(purview): fail-closed on responses API transform error; avoid duplicate audit calls Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(purview): fail-closed blocking DLP; revert directory-based UI HTML Blocking hooks now require UserAPIKeyAuth user_id/end_user_id only (no spoofable metadata), re-raise Responses API transform errors, scan streamed text completions, and reject requests with no bound identity. Reverts the accidental directory-based Next.js output from cc47081 (c70c4303b7). Co-authored-by: Cursor <cursoragent@cursor.com> * Remove dead code in purview_dlp: _resolve_user_id_for_blocking never returns falsy The method either returns a non-empty trusted user id or raises HTTPException, so the 'if not user_id' guards in async_pre_call_hook and async_post_call_success_hook were unreachable. Tighten the return type to str and drop the dead checks to make the fail-closed behavior explicit. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(purview): exclude caller-controlled end_user_id from blocking DLP Blocking Purview checks now use only API-key/JWT-bound user_id, not end_user_id populated from request user/metadata/safety_identifier. Co-authored-by: Cursor <cursoragent@cursor.com> * style(purview): apply Black formatting to base.py Co-authored-by: Cursor <cursoragent@cursor.com> * fix(purview): use post-await timestamp for cache TTL Capture the timestamp after the network call completes when storing it as the cache freshness marker, so the effective TTL reflects when the response was actually received rather than when the request started. Under high network latency the previous behavior shortened the effective cache lifetime. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(purview_dlp): fail closed when stream_chunk_builder returns None stream_chunk_builder can return None (e.g., when ChunkProcessor filters all chunks), causing both isinstance checks to fail and the buffered chunks to be released without DLP scanning. Explicitly fail closed in that case by raising an HTTPException so the streaming DLP guardrail does not bypass policy enforcement. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(purview_dlp): resolve user_id before buffering stream Co-authored-by: Yassin Kortam <yassin@berri.ai> * merge main (#28629) * test(vcr): classify cache verdicts, detect live calls, surface cost leaks Convert the per-test VCR verdict line from a single 'NOOP / HIT / MISS / PARTIAL' tag into a classified outcome that distinguishes the cases that silently bill the live API on every CI run from the ones that don't: HIT pure replay PARTIAL mixed replay + new recordings MISS:RECORDED new cassette saved to Redis (cached next run) MISS:OVERFLOW cassette > MAX_EPISODES_PER_CASSETTE; persister refused to save; re-bills every run MISS:NOT_PERSISTED test failed; save_cassette skipped; re-bills NOOP VCR-marked but no HTTP traffic (mocked elsewhere) UNMARKED:LIVE_CALL test bypassed VCR AND opened a TCP connection to a known LLM provider host -> wasted spend UNMARKED:NO_TRAFFIC test bypassed VCR but didn't call out The UNMARKED:LIVE_CALL signal is what converts 'this test probably hits live' into 'this test connected to api.openai.com'. We install a socket.connect / socket.create_connection wrapper for the duration of each non-VCR-marked test and record any outbound TCP to a known LLM provider hostname. The probe sits below the httpx layer so vcrpy and respx (which both patch above the socket) are unaffected. Replace the file-level _RESPX_CONFLICTING_FILES blacklists in the llm_translation and local_testing conftests with per-item respx detection in apply_vcr_auto_marker_to_items. A test now skips VCR when it actually carries @pytest.mark.respx or has respx_mock in its fixture chain - not just because some other test in the same file imports MockRouter. Items skipped by skip_files are split into respx_conflict (real conflict, the module wires up respx) vs file_opt_out (dead skip- list entry whose module never touches respx) so the session summary makes pruning obvious. Stabilize the AWS SigV4 fingerprint: the Authorization header on Bedrock requests rotates its Credential date and Signature on every call, which previously pushed every Bedrock test past the 50-episode overflow threshold. Extract the access-key id only ('aws-sigv4:AKIA...') so two requests with the same identity match. Always emit verdict logging when VCR is active (set LITELLM_VCR_VERBOSE=0 to opt back into the legacy quiet mode). Add a session-end classification summary that lists overflow tests, unmarked live-call tests, and the skip-reason breakdown. Wire the live-call probe + summary hook into every test directory that already uses the Redis-backed VCR cache (audio_tests, guardrails_tests, image_gen_tests, litellm_utils_tests, llm_responses_api_testing, llm_translation, local_testing, logging_callback_tests, ocr_tests, pass_through_unit_tests, router_unit_tests, search_tests, unified_google_tests). Add tests/llm_translation/test_vcr_classification.py covering the verdict classifier, skip-reason tagging, AWS SigV4 fingerprint stability, live-host classification, and session summary rendering. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): drop dead 'from respx import MockRouter' imports These seven test files were on _RESPX_CONFLICTING_FILES, which made the auto-marker skip them entirely. Inspecting the source shows the only respx artifact is a top-level 'from respx import MockRouter' that no test ever uses - no @pytest.mark.respx, no respx_mock fixture, no respx.mock context manager. The import is dead code left over from a previous mocking pattern. Now that apply_vcr_auto_marker_to_items detects respx per-item via the marker / fixture chain (`b637d9f64a`), the file-level skip is no longer needed for these files - they were the reason the OpenAI tests (test_o3_reasoning_effort, test_streaming_response[o1/o3-mini], TestOpenAIO1::test_streaming, TestOpenAIChatCompletion::test_web_search, TestOpenAIO3::test_web_search, etc.) ran live every CI build despite the cassette cache being healthy. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(image_edits): regenerate fixtures per call instead of holding open module-level file handles Module-level TEST_IMAGES = [ open(os.path.join(pwd, 'ishaan_github.png'), 'rb'), open(os.path.join(pwd, 'litellm_site.png'), 'rb'), ] SINGLE_TEST_IMAGE = open(...) opens the file once at import. After the first multipart upload, the file pointer is at EOF, so every subsequent test in the same xdist worker sends an empty multipart body. That non-determinism (a) blows the recorded cassette past MAX_EPISODES_PER_CASSETTE (50) so _RedisPersister.save_cassette refuses to save it, and (b) re-bills the live image edit endpoint on every CI run. Recent CI runs confirm the leak: tests/image_gen_tests/test_image_edits.py shows six tests parking at 51-52 cassette entries (TestOpenAIImageEditGPTImage1::test_openai_image_edit_litellm_sdk[False], TestOpenAIImageEditDallE2::..., test_openai_image_edit_with_bytesio, test_openai_image_edit_litellm_router, test_multiple_vs_single_image_edit[False], test_multiple_image_edit_with_different_formats). Replace the module-level file handles with _make_test_images() / _make_single_test_image() factories that return fresh _RewindableImage (BytesIO subclass) objects whose pointer always starts at 0. The image bytes are read once at import into module-level constants (_ISHAAN_GITHUB_BYTES, _LITELLM_SITE_BYTES), so disk I/O cost is unchanged. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(vcr): match real Bedrock hostnames in live-call probe The suffix '.bedrock-runtime.amazonaws.com' never matched real Bedrock endpoints, which use the format 'bedrock-runtime[-fips].{region}.amazonaws.com' (region between 'bedrock-runtime' and 'amazonaws.com'). Add an explicit host check for that pattern so Bedrock live calls are visible to the probe, and update the unit test accordingly. Also drop the unused '_LIVE_CALL_PROBE_INSTALLED' module variable. * fix(vcr): cover full RFC1918 172.16.0.0/12 range in local prefixes * fix(image_edits): drop _RewindableImage to prevent infinite multipart upload The _RewindableImage(BytesIO) wrapper auto-rewound on every read after EOF, which made the OpenAI SDK's multipart upload writer read the same bytes forever instead of seeing EOF. Workers OOM'd / SIGKILL'd: [gw0] node down: Not properly terminated replacing crashed worker gw0 ... worker 'gw1' crashed while running 'tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditGPTImage1::test_openai_image_edit_litellm_sdk[False]' The auto-rewind was added defensively for parametrized + flaky-retried tests, but BaseLLMImageEditTest::test_openai_image_edit_litellm_sdk already calls get_base_image_edit_call_args() once per invocation and that helper now constructs fresh streams via _make_test_images(), so rewinding inside the stream is unnecessary. Replace with plain BytesIO seeded with the cached image bytes. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): mark Bedrock prompt-caching cross-call tests VCR-incompatible The pass_through prompt-caching tests (test_prompt_caching_returns_cache_read_tokens_on_second_call, test_prompt_caching_streaming_second_call_returns_cache_read) make a warm-up call and then assert the second call sees a non-zero cache_read_input_tokens count from the upstream's prompt-cache. VCR replay can't model cross-call provider state — both calls match the same cassette episode, so the second call returns the first call's pre-warmup response and the assertion fails: AssertionError: Expected cache_read_input_tokens > 0 on second call, but got 0. Full usage: {'input_tokens': 4986, 'cache_creation_input_tokens': 4974, 'cache_read_input_tokens': 0} This started biting after the AWS SigV4 fingerprint stabilization (`b637d9f64a`): Bedrock requests now produce a stable per-access-key fingerprint instead of a per-request signature, so cassettes successfully replay where they previously always missed and re-recorded live. Opt these tests out via skip_nodeid_suffixes so they run live and match the existing pattern in tests/llm_translation/conftest.py (::test_prompt_caching). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): tighten OVERFLOW classification and switch respx detection to AST Address two greptile P2 review concerns on PR #27795: 1. MISS:OVERFLOW was firing whenever total > MAX_EPISODES_PER_CASSETTE regardless of cassette state. A cassette that grew past the cap historically but this run only replayed (dirty=False) is healthy — the persister never tries to save, so the cache state is stable and the next run will replay too. Only flag OVERFLOW when dirty=True (new episodes were recorded that the persister would refuse to save). Add a regression test covering the dirty=False + large-total case. 2. _module_uses_respx did substring matching on the module source, which false-positives on comments / docstrings / string literals. A comment like # Previously tried respx.mock but switched to vcrpy would keep a file pinned on the opt-out list, defeating the dead-import pruning goal of this PR. Replace the substring scan with an ast.NodeVisitor (_RespxUsageVisitor) that only counts: - @pytest.mark.respx / @respx.mock decorators - with respx.mock(): ... (sync + async) context managers - respx.mock(...) calls outside a with/decorator - function parameters / fixture names equal to respx_mock Add tests for the comment / docstring / string-literal cases plus each real-usage pattern. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(vcr): aggregate worker stats on the controller so the session summary actually renders under xdist `_session_stats` is a module-level dict mutated inside `_vcr_outcome_gate` — which runs in each xdist worker process. The controller's `pytest_terminal_summary` then reads its own empty `_session_stats` and bails on `if not counts: return`, so the OVERFLOW / LIVE_CALL sections the rest of this PR adds never make it into CI logs in the dist mode CI actually uses. Ship a structured `vcr_outcome` payload via `user_properties` (which xdist round-trips) and add `aggregate_report_outcome` on the controller to fold worker outcomes into `_session_stats`. The recording process tags `vcr_recorded_by` with `PYTEST_XDIST_WORKER` so the controller can tell "single-process — already counted locally" apart from "produced by a worker — needs aggregation here", and not double-count when there's no xdist. Covered by 9 new unit tests in test_vcr_classification.py including the end-to-end summary render path. * fix(guardrails): improve CrowdStrike AIDR input handling (#26658) * feat(lasso): add tool-calling support to LassoGuardrail (#27648) * feat(lasso): extend LassoGuardrail to support tool calling (RND-5748) * fix(lasso): PR review followups for tool-calling guardrail (RND-5748) * fix(lasso): handle object-style tool_calls in _update_tool_calls_from_masked (RND-5748) * fix(lasso): use model role for tool_use blocks (RND-5748) * test(lasso): add round-trip tests for message transformation (RND-5748) * fix(lasso): remove unused imports, handle Responses-API input masking, flatten multimodal content (RND-5748) * fix(lasso): inspect Responses-API input field (RND-5748) * fix(lasso): guard text-cursor remap against Lasso count mismatch (RND-5748) * fix(lasso): flatten list content in tool_result.content (RND-5748) * fix(lasso): remap multimodal list content during masking (RND-5748) Bug: _map_masked_messages_back counted list-content messages in original_text_count but the remap loop only handled isinstance(str). The positional text_cursor never advanced for list messages, causing all subsequent masked texts to be written onto the wrong messages. Fix: added elif isinstance(content, list) branch that replaces the list with the masked text string and advances the cursor — mirrors the existing string-content branch. Also handles the assistant + tool_calls combo for list-content messages. Test: test_map_masked_messages_back_list_content verifies a user message with [text + image_url] followed by an assistant message gets correct masked content on both (cursor stays aligned). * refactor(lasso): extract _get_field and _extract_tool_call_fields helpers (RND-5748) The dict-vs-object access pattern (x.get('y') if isinstance(x, dict) else getattr(x, 'y', None)) was duplicated 14 times across 5 methods. _get_field(obj, field) — single-point dict/Pydantic field access. _extract_tool_call_fields(call) — returns (call_id, name, parsed_input) with JSON argument parsing, replacing ~30 duplicate lines in both async_post_call_success_hook and _expand_messages_for_classification. Also simplified _update_tool_calls_from_masked, _prepare_payload tool mapping, and _apply_masking_to_model_response call_id extraction. Net ~60 lines removed. No behavior change — all 32 tests pass. * fix(lasso): add count guard to _apply_masking_to_model_response (RND-5748) _apply_masking_to_model_response used a bare text_cursor without verifying 1:1 correspondence between text-bearing choices and masked text entries. If Lasso returned a different number of text messages than choices with content, masked text would be applied to the wrong choice or silently skip choices. Added the same count-mismatch guard pattern already used in _map_masked_messages_back: count original text-bearing choices, compare to masked_text length, skip text remap on mismatch with a warning log. Tool_call masking via id-based lookup is unaffected. Tests: - test_apply_masking_to_model_response_multiple_choices: verifies correct per-choice masked text with 2 choices - test_apply_masking_to_model_response_count_mismatch: verifies content is left unchanged when counts disagree * fix(lasso): close two guardrail-bypass paths flagged in review (RND-5748) * tool-call args: when function.arguments is malformed JSON or parses to a non-object, preserve the raw string as {"arguments": <raw>} so Lasso still inspects it instead of receiving input=None. Covers both pre-call and post-call extraction (shared helper). Also resolves the CodeQL empty-except warning since the except body now assigns parsed=None. * Responses-API input: when a request carries both "messages" and "input", inspect both. Previously a benign messages array let the guardrail skip data["input"] entirely. The masking write-back is split via a count boundary so masked messages flow back to data["messages"] and masked input flows back to data["input"] without cross-contamination. Tests: malformed/non-object args round-trip, dual-field classification, dual-field masking write-back split. * chore(lasso): black formatting + comment on expand skip branch (RND-5748) * black: wrap two long expressions in lasso.py and reformat dict literals in test_lasso.py to satisfy CI lint. * add a short comment in _expand_messages_for_classification explaining why empty string and None content are intentionally skipped (None is the OpenAI shape for a pure tool-call turn). * fix(lasso): satisfy mypy in _handle_masking, _update_tool_calls_from_masked, _apply_masking_to_model_response (RND-5748) * Narrow `response.get("messages")` into a local before slicing so mypy doesn't see `Optional[List[Dict[str, str]]]` as non-indexable. * Rename the two write-side `func` bindings in `_update_tool_calls_from_masked` to `func_dict` / `func_obj` so mypy doesn't unify the dict and Any\|None branches. * Rename the inner loop variable in `_apply_masking_to_model_response` from `msg` to `masked_msg` to avoid clashing with the `msg = choice.message` rebinding below. No behavior change; resolves the 7 mypy errors from the CI lint job. * perf: eliminate per-request callback scanning on proxy hot path (#27858) - Introduce `_CallbackCapabilities` dataclass and `ProxyLogging._callback_capabilities()` static method that inspects `litellm.callbacks` once and caches capability flags keyed on (list length, member ids); invalidates automatically when the callback list mutates without per-request iteration overhead - Replace O(n) `litellm.callbacks` walks in `async_pre_call_hook`, `during_call_hook`, `async_post_call_streaming_iterator_hook`, `async_post_call_streaming_hook`, and `post_call_response_headers_hook` with fast-path exits when no relevant callbacks are registered - Add `needs_iterator_wrap()` and `needs_per_chunk_streaming_hook()` instance methods to decouple iterator-level wrapping from per-chunk hook execution; avoids `get_response_string` materialization per chunk when no guardrail or chunk-hook callback is active - Introduce `_fast_serialize_simple_model_response_stream()` using `orjson` for common single-choice text streaming chunks, bypassing the full Pydantic serializer; falls back to `model_dump_json` for tool calls, logprobs, usage, and provider-specific fields - Add early-return in `_restamp_streaming_chunk_model` when downstream model already matches the requested model, avoiding unnecessary string comparisons on every chunk - Fix stale zero-cost cache bug in `_is_model_cost_zero`: move the per-router `_zero_cost_cache` dict onto the `Router` instance and clear it in `_invalidate_model_group_info_cache` so in-place pricing updates via `upsert_deployment` immediately resume budget enforcement - Add `scripts/benchmark_chat_completions_perf.py`: standalone async benchmarking tool with a mock OpenAI provider, LiteLLM proxy process management, non-streaming RPS, streaming TTFT, and full-stream latency measurements with repeat/median run support - Add comprehensive unit tests covering capability detection, cache invalidation, fast-path correctness, zero-cost cache regression, and the no-callback streaming fast path Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> * ci(mutmut): enable mutate_only_covered_lines to fit in CI budget (#27910) The mutation-test workflow timed out at the 350-minute job cap when running whole-folder mutation against litellm/proxy/management_endpoints/ (~30 files, ~1.5 MB of source). Every mutant was running the full test suite, and mutants were generated for lines no test covers — which would survive regardless, just wasting compute. mutmut 3.x's mutate_only_covered_lines setting runs the suite once up front to compute coverage, then skips mutating uncovered lines. This cuts the mutant count dramatically and is the right semantic for the score (no test → no kill possible → uncountable). Per-mutant test filtering by function name is already automatic in mutmut 3.x; no external coverage step is needed. * fix(rate-limit): stop v3 limiter from leaking internal stash to provider body (#27913) * fix(rate-limit): stop v3 limiter from leaking internal stash to provider body PR #27001 (atomic TPM rate limit) introduced a reservation flow that writes four LiteLLM-internal keys onto the request data dict: _litellm_rate_limit_descriptors _litellm_tpm_reserved_tokens _litellm_tpm_reserved_model _litellm_tpm_reserved_scopes _litellm_tpm_reservation_released These keys are forwarded as request body params to the upstream provider, which rejects them as unknown fields: OpenAI -> 400 'Unknown parameter: _litellm_rate_limit_descriptors' (mapped by litellm to RateLimitError / 429, hiding the bug behind a misleading 'throttling_error' code) Anthropic -> 400 '_litellm_rate_limit_descriptors: Extra inputs are not permitted' Net effect: every chat completion against any real provider fails the moment a virtual key has any tpm_limit / rpm_limit set — i.e. v3-enforced key-level TPM/RPM limits are broken end-to-end. The v3 RPM/TPM check itself still runs (raises 429 on over-limit), but the success path poisons the upstream body. Reproduced on litellm_internal_staging HEAD (`410ce761dc`) against gpt-4o-mini and claude-haiku-4-5 with a 1-RPM/1-TPM key — first request fails with the provider's unknown-field error. Fix: the stash is metadata only. - Add RATE_LIMIT_DESCRIPTORS_KEY constant and a _LITELLM_STASH_KEYS registry so we have a single source of truth for stash keys. - New helper _stash_value_in_metadata_channels writes to data['metadata'] / data['litellm_metadata'] without touching the top level. - _stash_reservation_in_data and the descriptor stash now route through that helper. _mark_reservation_released stops writing top-level. - _lookup_stashed_value also checks kwargs['metadata'] / kwargs['litellm_metadata'] (raw request_data shape) in addition to kwargs['litellm_params']['metadata'] (completion kwargs shape). - async_post_call_failure_hook now reads descriptors via the unified metadata lookup instead of request_data.get(top-level). - Defense in depth: async_pre_call_hook strips any stash key that somehow surfaced at the top level (stale cache, future refactor, test fixture) before returning. Tests: - New regression test asserts no _litellm_* stash key is present at the top level of data after async_pre_call_hook, and that the metadata channel still carries the reservation + descriptors so success / failure reconciliation works. - Existing test_tpm_concurrent.py tests that asserted top-level presence are updated to read from data['metadata'] — the location is an implementation detail; the spec is that post-call callbacks can resolve the stash. Verified end-to-end against OpenAI gpt-4o-mini and Anthropic claude-haiku-4-5 via /v1/chat/completions on a low-rpm key: - With limits not exceeded: HTTP 200, valid completion response, no leaked fields in body. - With RPM exceeded: HTTP 429 from v3 enforcement ('Rate limit exceeded ... Limit type: requests'). - With TPM exceeded: HTTP 429 from v3 enforcement ('Rate limit exceeded ... Limit type: tokens'). Full v3 hook test suite passes (171 tests). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * chore(rate-limit): use RATE_LIMIT_DESCRIPTORS_KEY constant in test, trim noisy comments Address greptile P2: test fixture now uses the imported constant. Drop comments that re-explain what well-named identifiers already convey. * fix(rate-limit): reject caller-supplied stash values to prevent TPM-refund abuse Strip _LITELLM_STASH_KEYS from data top-level and both metadata channels at the start of async_pre_call_hook. Without this, an authenticated caller can inject _litellm_rate_limit_descriptors plus _litellm_tpm_reserved_tokens in body metadata, trigger a proxy-side rejection, and cause async_post_call_failure_hook to refund TPM counters against attacker-named scopes (e.g. another tenant's api_key). --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix: allow for allowlisted redirect URIs (#27761) * fix: allow for allowlisted redirect URIs * github comment addressing * Update litellm/proxy/_experimental/mcp_server/oauth_utils.py Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> * harden oauth wildcard further * test: cover wildcard entry with dot-leading suffix rejection --------- Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> * Emit native web_search_tool_result blocks for Anthropic clients (Claude Desktop / Cowork citations) (#27886) * feat(custom_logger): add async_post_agentic_loop_response_hook Lets a CustomLogger shape the response returned by the agentic-loop follow-up call without bypassing the loop's safety / observability machinery (depth tracking, fingerprinting, etc.). Default returns the response unchanged. Used by websearch_interception to inject Anthropic-native web_search_tool_result blocks when the originating client requested a native web_search_* tool. * feat(llm_http_handler): call post-agentic-loop hook on the originating callback In _execute_anthropic_agentic_plan, after anthropic_messages.acreate returns, call the originating callback's async_post_agentic_loop_response_hook so it can mutate the final response (e.g. inject native tool_result blocks). Pass the callback through from _call_agentic_completion_hooks. Exceptions in the post-hook are caught and logged so a buggy callback can't kill the request. * feat(websearch_interception): add is_anthropic_native_web_search_tool Identifies tools the Anthropic-native clients (Claude Desktop, the Anthropic SDK, the Anthropic Console) use to request native search: type starts with "web_search_" (e.g. web_search_20250305). Rejects the LiteLLM standard tool, the OpenAI-function variant, the bare "WebSearch" legacy name, and the bare "web_search" Claude Code shape. This lets us decide per-request whether the client expects web_search_tool_result content blocks in the response, without renaming any existing constants or touching native-provider skip logic. * feat(websearch_interception): add build_web_search_tool_result_block Produces the Anthropic-native web_search_tool_result content block from a structured SearchResponse. Anthropic-native clients use this block to populate citations / source links — the existing text-blob flatten path only feeds readable evidence to the model and discards the structure, so this builder gives us the missing piece. Shape matches https://docs.anthropic.com/en/api/web-search-tool — web_search_result items carry url, title, page_age, encrypted_content (empty string when the search provider doesn't supply one). * feat(websearch_interception): emit native web_search_tool_result blocks When the originating client request carried a native Anthropic web_search_* tool, the final response now also carries web_search_tool_result content blocks alongside the model's text answer — so Claude Desktop / Anthropic SDK clients can populate the citations panel and replay conversation history with structured search evidence. Wiring: - Pre-request hooks (both deployment + Anthropic path) set a flag on kwargs when they see a native web_search_* tool, so the signal survives the conversion-to-litellm_web_search step regardless of which hook fires first. - _execute_search now returns (text, SearchResponse) so the structured results aren't lost when the text is flattened for the follow-up model call. - _build_anthropic_request_patch returns the parallel list of SearchResponse objects. - async_build_agentic_loop_plan pre-builds the web_search_tool_result blocks (one per tool_use_id) and stashes them on plan.metadata when the flag is set. - async_post_agentic_loop_response_hook reads the metadata and prepends the blocks to response.content. - _execute_agentic_loop mirrors the injection for the legacy path so both paths behave identically. Clients that send the LiteLLM standard tool keep the existing text-only behavior — no regression. * test(websearch_interception): cover native web_search_tool_result emission 18 tests across: - detector branches (native vs litellm-standard, OpenAI-function shape, Claude Desktop builtin WebSearch, bare web_search, missing type) - block-builder shape (results, none, empty) - pre-request hook flag-setting (native sets, standard does not) - async_build_agentic_loop_plan attaches blocks to plan.metadata when the flag is present, leaves metadata untouched when absent - post-hook injection into dict and object responses - legacy _execute_agentic_loop mirrors the injection so both paths return the same shape * test(websearch_short_circuit): keep _execute_search mocks in sync with new tuple return * test(websearch_thinking_constraint): keep _execute_search mocks in sync with new tuple return * feat(websearch_interception): emit native blocks from try_short_circuit_search The agentic-loop post-hook only fires when the model returns a tool_use block. Cowork / Claude Desktop on Bedrock actually make TWO requests per user turn: the main /v1/messages with their builtin tool, and a separate standalone /v1/messages whose only tool is web_search_20250305. That second request hits try_short_circuit_search — no agentic loop, no post-hook — and was returning text-only, leaving the citations panel empty. When the short-circuit input carries a native web_search_* tool, build a synthetic server_tool_use + web_search_tool_result pair (using the structured SearchResponse already returned by _execute_search) so the client gets the native shape it expects. The legacy text block is preserved so non-native short-circuit callers (Claude Code, github_copilot, etc.) see the same payload as before. Failure path still emits the native block pair (with empty results) plus the text-error block, so the client gets a well-formed response rather than a malformed half-shape. * test(websearch_native_blocks): cover short-circuit native-block emission Three new cases on top of the existing 18: - native web_search_20250305 short-circuit → [server_tool_use, web_search_tool_result, text], ids paired, urls/titles carried. - litellm_web_search short-circuit → text-only (no regression). - native short-circuit on search failure → still emits the native block pair (empty results) plus the text-error block, so the client never sees a malformed half-shape. * test(websearch_short_circuit): index assertions by block type, not by position Native short-circuit responses now have [server_tool_use, web_search_tool_result, text] when the input carries web_search_20250305 — find the text block by type rather than relying on content[0]. * fix(websearch_interception): gate legacy WebSearch name on schema absence Clients like Cowork / Claude Desktop ship a client-side tool named "WebSearch" with a full input_schema — they handle it themselves and expect to make a separate native web_search_20250305 sub-request for the actual search. Today is_web_search_tool matches the bare name regardless of other fields, which hijacks the client's tool server-side. The agentic loop fires on the main request, the model never gets to emit the client-side tool_use, and the separate native sub-request (where citation data flows) is never made. Net: citations panel empty. Real Anthropic client tools always carry input_schema (the API rejects them otherwise), so a bare {name: "WebSearch"} with no schema is the only thing that could be a legacy interception marker. Gate the match on schema absence: legacy callers (if any) keep working, real client-side WebSearch tools pass through untouched. * fix(websearch_interception): drop "WebSearch" from response-detection lists Post-conversion the model always sees ``litellm_web_search``, so the "WebSearch" entry in the response-side tool_use detection lists was dead at best. If a model ever did return ``tool_use(name="WebSearch")`` it would now (incorrectly) hijack the client's own ``WebSearch`` tool again — same Cowork problem we just fixed on the input side. Drop it. * test(websearch_native_blocks): cover the WebSearch legacy-name schema gate Three new cases: - {name: "WebSearch"} (bare interception marker) → still matched - {name: "WebSearch", input_schema: {...}} (Cowork client tool) → passes through untouched - {name: "WebSearch", description: "..."} (no schema) → still matched on the assumption it's a legacy marker rather than a malformed real client tool. --------- Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> * ci(codecov): restore litellm/ prefix on uploaded coverage paths pytest-cov runs with --cov=litellm, which makes coverage.xml store paths relative to the package root (e.g. `proxy/proxy_server.py` instead of `litellm/proxy/proxy_server.py`). Codecov auto-resolves these only when the basename is unique in the repo. Files like proxy_server.py, router.py, utils.py, main.py, and constants.py — which have duplicates under enterprise/ or other subpackages — get silently dropped during ingest. The `fixes: ["::litellm/"]` rule prepends `litellm/` to every uploaded path so they resolve unambiguously. Confirmed against multiple recent coverage.xml artifacts that no uploader currently emits paths already prefixed with `litellm/`, so the rule is safe to apply universally. This restores Codecov visibility for the highest-fix-rate hotspots: proxy_server.py, router.py, proxy/utils.py, litellm_logging.py, constants.py, key_management_endpoints.py, utils.py, main.py, user_api_key_auth.py, team_endpoints.py, and litellm_pre_call_utils.py. * chore(ci): remove unused GitHub Actions workflows and orphan files Audit of .github/workflows/ via gh run history shows the following have either never run or have been dormant for 10+ weeks. CI coverage that still matters is preserved on CircleCI (e.g. llm_translation_testing). Removed workflows: - test-litellm.yml — workflow_dispatch only, last run 2026-02-12 (cancelled); CCI local_testing_part1/2 covers the same tests - llm-translation-testing.yml — last run 2025-07-10; replaced by CCI llm_translation_testing job (run_llm_translation_tests.py kept for the make test-llm-translation target) - run_observatory_tests.yml — last run 2026-03-03 (cancelled) - scan_duplicate_issues.yml — last run 2026-03-02 (failure) - publish_to_pypi.yml — never run - read_pyproject_version.yml — fires on every push to main but its echoed version output is not consumed by any downstream step Removed orphan files (no callers in workflows, CCI, or Makefile): - .github/workflows/README.md — documented only publish_to_pypi.yml - .github/workflows/update_release.py + results_stats.csv - .github/actions/helm-oci-chart-releaser/ * Revert "ci(codecov): restore litellm/ prefix on uploaded coverage paths" This reverts commit `e25a988a3f`. The `fixes: ["::litellm/"]` rule turned out to be applied after Codecov's auto-resolution, not before. Files with unique basenames (which were auto-resolving correctly to `litellm/<path>`) got an extra `litellm/` prepended, producing `litellm/litellm/<path>` storage. Files with ambiguous basenames (the actual target of the fix) continued to be dropped because the auto-resolution still failed for them. Net result on the verification run: 1375 files now stored under unresolvable `litellm/litellm/...` paths, and the 11 originally-missing hotspots are still missing. Reverting before piling on further changes. * test(ui): preserve global Button/Tooltip mocks in per-file @tremor/react vi.mock Per-file `vi.mock("@tremor/react", ...)` factories fully replace the setup-level mock from `tests/setupTests.ts`, so the global Button/Tooltip overrides are lost in any file that re-mocks `@tremor/react`. Without them, the real Tremor `<Button>` leaks through and its internal `useTooltip(300)` schedules a native 300ms `setTimeout` on pointer events. When the test environment is torn down before the timer fires, the trailing `setState` calls `getCurrentEventPriority`, which reads `window.event` against a destroyed jsdom -> "window is not defined" flake observed on CI. Patches the 7 leaky test files to re-supply `Button` (bare `<button>`) and `Tooltip` (Fragment) overrides matching `setupTests.ts`. Also drops a dead `afterEach` workaround in `user_edit_view.test.tsx` (the fake-timer dance it ran could not drain a real timer scheduled before the swap) and corrects a misleading comment in `MakeMCPPublicForm.test.tsx`. * ci: use --cov=./litellm so coverage paths resolve unambiguously in Codecov pytest-cov treats --cov=<module-name> as a Python package and emits XML paths relative to the package root, stripping the litellm/ prefix (`proxy/proxy_server.py` instead of `litellm/proxy/proxy_server.py`). Codecov's auto-prefix heuristic then drops every file whose basename is ambiguous in the repo — `proxy_server.py` (3 copies under enterprise/), `router.py` (2 copies), `utils.py` (20+), `main.py` (20+), `constants.py` (2). The 11 highest-fix-rate hotspots have never appeared in Codecov. Switching to --cov=./litellm treats the argument as a path, which makes coverage.xml emit repo-relative paths (`litellm/proxy/proxy_server.py`). Each path is unambiguous, so Codecov resolves all files correctly. Verified locally: rerunning a single proxy_unit_tests test with --cov=./litellm produced `filename="litellm/proxy/proxy_server.py"`, `filename="litellm/router.py"`, and `filename="litellm/types/router.py"` as distinct entries — exactly the disambiguation Codecov needs. Touches every workflow that uploads coverage: the two reusable GHA workflows (_test-unit-base.yml, _test-unit-services-base.yml), test-mcp.yml, and all 14 invocations in .circleci/config.yml. * fix(mcp): allow delegate PKCE bypass for internal MCP servers Remove available_on_public_internet gating from delegate-auth-to-upstream paths so oauth2 + delegate_auth_to_upstream interactive servers behave the same when marked internal. Keeps M2M exclusion. Updates tests. * chore(mcp): warn on internal + upstream PKCE delegate Log verbose_logger.warning when loading oauth2 interactive servers with available_on_public_internet=false and delegate_auth_to_upstream=true (config + DB). Dashboard Alert for the same combo. CLAUDE note for operators. Tests for log and M2M skip. * fix(mcp): dedupe load_servers_from_config alias block Removes accidental duplicate alias/mcp_aliases and get_server_prefix logic (fixes PLR0915 and avoids resetting alias after mapping). * fix(mcp): expose delegate_auth_to_upstream in MCP server list rows (#27936) _build_mcp_server_table omitted delegate_auth_to_upstream, so GET /v1/mcp/server always returned the default false while the registry kept the DB value. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(proxy): fix vector store retrieve/list/update/delete without model (#27929) * feat(proxy): fix vector store retrieve/list/update/delete routing without model Co-authored-by: Cursor <cursoragent@cursor.com> * fix(proxy): remove unchecked query-param injection in vector store management endpoints Co-authored-by: Cursor <cursoragent@cursor.com> * test(proxy): use subset assertion for vector store route test to allow extra kwargs like shared_session Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix(managed_batches): convert raw output_file_id to managed ID in CheckBatchCost poller (#27984) * fix(managed_batches): convert raw output_file_id to managed ID in CheckBatchCost poller CheckBatchCost bypasses async_post_call_success_hook, causing raw provider output_file_ids to be persisted in LiteLLM_ManagedObjectTable. This fix converts output_file_id and error_file_id to managed base64 IDs before the DB write. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(check_batch_cost): persist managed file before mutating response and propagate team_id - Move setattr after store_unified_file_id so the response only receives the managed ID once the DB record is successfully written. Avoids serializing an orphaned managed ID into file_object when the store call fails. - Populate team_id on the minimal UserAPIKeyAuth from job.team_id so the managed file record is created with the correct team ownership, allowing other team members to access the batch output file via /files/{id}/content. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(managed_batches): extend test to cover error_file_id conversion Co-authored-by: Cursor <cursoragent@cursor.com> * fix managed file test --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(vertex-ai): fix zero cost/usage on completed Vertex AI batch jobs (#27912) * fix(vertex-ai): fix zero cost/usage on completed Vertex AI batch jobs Vertex batch jobs recorded 0 spend and 0 tokens after PR #25627 added automatic transformation of GCS predictions.jsonl to OpenAI format. Two bugs fixed: 1. batch_utils.py: the Vertex-specific cost/usage reader (calculate_vertex_ai_batch_cost_and_usage) was always invoked and reads raw usageMetadata fields that no longer exist in the OpenAI-shaped output. Now the reader is only used when disable_vertex_batch_output_transformation=True; otherwise the generic path handles the already-transformed OpenAI-shaped content. 2. cost_calculator.py: batch_cost_calculator skipped the global litellm.get_model_info() lookup when a model_info dict was passed in, even when that dict had no pricing fields (e.g. deployment metadata with only id/db_model). It now falls back to the global pricing table when the provided model_info has no pricing data. Co-authored-by: Cursor <cursoragent@cursor.com> * Update litellm/cost_calculator.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(cost-calculator): use not-any guard for pricing fallback in batch_cost_calculator Co-authored-by: Cursor <cursoragent@cursor.com> * fix(cost-calculator): treat explicit zero batch pricing as set in model_info The fallback to litellm.get_model_info() used truthy checks on pricing fields, so 0.0 was treated as missing and replaced by global rates. Use `is not None` like elsewhere in cost calculation. Add regression test. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * feat: add weighted-routing failover (#27980) * Feat: Add Weighted-Routing Failover * test(router): cover weighted failover helper functions Co-authored-by: Cursor <cursoragent@cursor.com> * fix(router): align weighted failover deployment list type with mypy Co-authored-by: Cursor <cursoragent@cursor.com> * fix(router): address greptile review on weighted failover - Narrow exception swallowing in `_maybe_run_weighted_failover` to `openai.APIError` so model failures defer to the regular fallback while programming bugs (AttributeError/KeyError/TypeError) surface. - Note async-only limitation of `enable_weighted_failover` in the Router constructor docstring. - Make the weighted distribution test less flaky (1000 iterations, looser bound) and make the non-simple-shuffle test deterministic by failing both deployments instead of relying on the latency strategy's first pick. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(router): ensure weighted failover metadata persists in kwargs The previous `kwargs.setdefault(metadata_variable_name, {}) or {}` returned a brand-new dict whenever the existing metadata was falsy (empty dict or None), so writes to `_failover_excluded_ids` never made it back into `kwargs`. Multi-hop weighted failover then re-selected previously failed deployments and exhausted `max_fallbacks` prematurely. Explicitly assign a fresh dict into kwargs when metadata is missing so mutations are visible to subsequent failover hops. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(router): regression for weighted failover metadata persistence Asserts kwargs["metadata"]["_failover_excluded_ids"] is populated after _maybe_run_weighted_failover, proving the metadata dict written by the helper is the same object that lives in kwargs (no disconnected copy). Pairs with the prior fix that replaced `setdefault(..., {}) or {}` with an explicit get/assign so writes survive across hops. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(router): harden weighted failover error/state handling - Catch RouterRateLimitError (ValueError) alongside openai.APIError in _maybe_run_weighted_failover so an exhausted intra-group retry falls through to the regular cross-group fallback path instead of bubbling out and bypassing configured fallbacks. - Stop mutating the shared input_kwargs dict; build a local copy with the weighted-failover keys so the entry (with _excluded_deployment_ids) cannot leak into later fallback paths reading the same dict. - _get_excluded_filtered_deployments now returns an empty list when the exclusion filter removes every healthy deployment, instead of falling back to the original list. The original-list behavior risked re-picking the just-failed deployment; callers already handle the empty case by raising their no-deployments error, which weighted failover now catches and converts into a normal cross-group fallback. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(router): fall through to rpm/tpm when total weight is zero When the weight metric's total is zero (e.g. after weighted-failover exclusion leaves only zero-weight backups), continue to the next metric (rpm/tpm) instead of returning a uniform random pick immediately. This lets rpm/tpm still drive routing when present, and only falls back to the uniform random pick at the end if no metric provides a positive total weight. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(router): skip weighted failover when remaining deployments are all in cooldown _maybe_run_weighted_failover was computing 'remaining' from all_deployments (every deployment in the model group, including those in cooldown). This meant that when all non-excluded deployments were in cooldown the method still invoked run_async_fallback unnecessarily, which propagated into async_get_healthy_deployments, found no eligible deployments, and raised RouterRateLimitError — only safely caught thanks to the earlier exception-broadening fix. The fix: before computing 'remaining', fetch the current cooldown set via _async_get_cooldown_deployments and subtract it from all_ids. This allows _maybe_run_weighted_failover to return None immediately (skipping the run_async_fallback call entirely) when every non-failed deployment is in cooldown, letting the caller fall through to the correct cross-group fallback path without the wasteful extra round-trip. Tests added: - unit: _maybe_run_weighted_failover returns None without calling run_async_fallback when all remaining deployments are in cooldown - unit: _maybe_run_weighted_failover still calls run_async_fallback when at least one healthy (non-cooldown) deployment is available - integration: end-to-end fallthrough to cross-group fallback when remaining deployments are in cooldown Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix(bedrock-mantle): use /anthropic/v1/messages path for Mantle endpo… (#27976) * fix(bedrock-mantle): use /anthropic/v1/messages path for Mantle endpoint (#27943) * docs: add one-line docstring to _disable_debugging (#27894) Squash-merged by litellm-agent from oss-agent-shin's PR. * Add jp. Bedrock cross-region inference profile for claude-sonnet-4-6 (#27831) Squash-merged by litellm-agent from Cyberfilo's PR. * Sanitize empty text content blocks on /v1/messages (#27832) Squash-merged by litellm-agent from Cyberfilo's PR. * fix(bedrock-mantle): use /anthropic/v1/messages path for Mantle endpoint The bedrock-mantle gateway (Claude Mythos Preview) serves the Anthropic Messages API at /anthropic/v1/messages; /v1/messages returns 404 Not Found. Both AmazonMantleConfig (chat/completions caller route) and AmazonMantleMessagesConfig (anthropic-messages caller route) hardcoded the wrong path, so every Mantle request 404'd before reaching the model. Per the Anthropic docs: "[Claude in Amazon Bedrock] uses the Messages API at /anthropic/v1/messages with SSE streaming." https://platform.claude.com/docs/en/api/claude-on-amazon-bedrock Confirmed independently against the live endpoint: /v1/chat/completions -> 200 OK /v1/messages -> 404 Not Found (what litellm used) /anthropic/v1/messages -> 200 OK (Claude only) Adds a regression test asserting both Mantle configs build the /anthropic/v1/messages path, and updates the existing assertions that encoded the wrong path. --------- Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> * fix: sanitize empty text blocks in sync anthropic_messages_handler path Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: João Costa <13508071+jpv-costa@users.noreply.github.com> Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(utils): import get_secret at runtime (#28014) * fix(proxy): make /config/update env-var encryption idempotent A single decrypt-then-encrypt chokepoint (_encrypt_env_variables_for_db) now backs both update_config and save_config. Re-submitting a value the Admin UI read back from /get/config/callbacks as ciphertext no longer stacks a second encryption layer, which previously decrypted to garbage and silently broke the callback. The chokepoint decrypts with the pure _decrypt_db_variables (no os.environ mutation on the write path) and encrypts exactly once; update_config merges only the sent keys so untouched env vars keep their stored ciphertext byte-for-byte. * test(proxy): add endpoint-level regression for /config/update double-encryption Adds test_update_config_env_var_round_trip_not_double_encrypted, which drives the real /config/update handler: first write plaintext, then re-POST the stored ciphertext (the Admin UI round-trip) and assert the value is not stacked with a second encryption layer and untouched keys stay byte-identical. Verified to fail against the pre-fix handler and pass after. Also tightens the unit test to exactly three ciphertext re-feeds. * chore(ci): modernize model references in tests and configs (#27856) * test: modernize models used in CircleCI e2e test suites Replaces obsolete models (gpt-4o, gpt-4o-mini, gpt-3.5-turbo, claude-3-5-sonnet-20240620, claude-sonnet-4-20250514) with current equivalents across the e2e_openai_endpoints and proxy_e2e_anthropic_messages_tests CircleCI jobs. - gpt-4o -> gpt-5.5 (responses API e2e tests) - gpt-4o-mini -> gpt-5-mini (websocket responses, oai_misc_config) - gpt-4o-mini-2024-07-18 -> gpt-4.1-mini-2025-04-14 (fine-tuning, still actively fine-tunable) - gpt-4 / gpt-3.5-turbo target_model_names example -> gpt-5.5 / gpt-5-mini - bedrock claude-3-5-sonnet-20240620 batch entry -> haiku-4-5-20251001 (also aligning oai_misc_config model_name with what test_bedrock_batches_api.py actually requests) - bedrock claude-sonnet-4-20250514 (deprecated, retires 2026-06-15) -> claude-sonnet-4-5-20250929 * test: point bedrock-claude-sonnet-4 alias at Sonnet 4.6, not 4.5 Greptile/Cursor flagged that after the previous commit, the bedrock-claude-sonnet-4 alias collided with bedrock-claude-sonnet-4.5 (both pointed to claude-sonnet-4-5-20250929). Rename to bedrock-claude-sonnet-4.6 and point it at the Sonnet 4.6 Bedrock ID (us.anthropic.claude-sonnet-4-6, already in the litellm model registry) so the alias name matches the underlying model version. * test: modernize models across remaining CI-mounted configs & tests Expands the modernization sweep to all CircleCI-mounted proxy configs and to test directories where the model literal is a fixture/route key (not the test's subject). Config changes: - proxy_server_config.yaml: bump gpt-3.5-turbo / gpt-3.5-turbo-1106 / gpt-4o / gemini-1.5-flash / dall-e-3 underlying models; rename gpt-3.5-turbo-end-user-test alias to gpt-5-mini-end-user-test; bump text-embedding-ada-002 underlying to text-embedding-3-small. User- facing aliases (gpt-3.5-turbo, gpt-4, text-embedding-ada-002, etc.) preserved for backward compatibility with tests. - simple_config.yaml, otel_test_config.yaml, spend_tracking_config.yaml: bump gpt-3.5-turbo underlying to gpt-5-mini. - pass_through_config.yaml: claude-3-5-sonnet / claude-3-7-sonnet / claude-3-haiku entries replaced with claude-sonnet-4-5 / claude- haiku-4-5 / claude-opus-4-7. - oai_misc_config.yaml: align alias name with the gpt-5-mini rename. Test changes (proactive: claude-sonnet-4-20250514 / claude-opus-4- 20250514 retire 2026-06-15): - tests/llm_translation/test_anthropic_completion.py: bump 3 references + paired Vertex AI ID to claude-sonnet-4-5. - tests/llm_translation/test_optional_params.py: bump 2 references. - tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py and test_bedrock_anthropic_messages_test.py: bump router fixtures using the deprecated model IDs. - tests/pass_through_unit_tests/base_anthropic_messages_tool_search_test.py: modernize docstring examples. - tests/test_end_users.py: update references to renamed alias. * test: modernize placeholder model literals in router_unit_tests Mass replace_all on fixture/placeholder model literals across the router_unit_tests/ suite (model name is a routing key / label, not the test subject). Sub-agent sweep so far — additional commits will follow for logging_callback_tests/, enterprise/, top-level tests/test_.py, and other CI-mounted dirs. Mappings applied: - gpt-3.5-turbo -> gpt-5-mini - gpt-4 (bare) -> gpt-5.5 - gpt-4o (bare) -> gpt-5 - text-embedding-ada-002 -> text-embedding-3-small - claude-3-sonnet-20240229 / claude-3-opus-20240229 / claude-3-haiku-20240307 / claude-3-5-sonnet-20240620 -> claude-sonnet-4-5-20250929 / claude-opus-4-7 / claude-haiku-4-5-20251001 as appropriate Explicitly preserved: - gpt-4o-mini- variants (transcribe, tts, etc.) where they're current - gpt-4-turbo / gpt-4-vision-preview / gpt-4-0613 (subject literals) - JSONL batch body literals - Mock LLM response model fields (must match upstream) - Fake/mock identifiers * test: modernize placeholder model literals across remaining CI suites Sub-agent sweep across logging_callback_tests/, guardrails_tests/, enterprise/, pass_through_unit_tests/, otel_tests/, llm_responses_api_testing/, batches_tests/, spend_tracking_tests/, litellm_utils_tests/, unified_google_tests/, and a few top-level tests/test_.py files where the model literal is a fixture or placeholder (router model_list, mock standard logging payload, mock callback data) rather than the test's subject. Mappings applied (see scope notes below): - gpt-3.5-turbo -> gpt-5-mini - gpt-4 (bare) -> gpt-5.5 - gpt-4o (bare) -> gpt-5.5 (corrected from initial gpt-5 — bare gpt-5 is not a valid OpenAI alias; only gpt-5.5 / gpt-5.4 / gpt-5.2-codex / gpt-5-mini exist) - gpt-4o-mini (bare) -> gpt-5-mini - text-embedding-ada-002 -> text-embedding-3-small - claude-3-sonnet-20240229 -> claude-sonnet-4-5-20250929 - claude-3-opus-20240229 -> claude-opus-4-7 - claude-3-haiku-20240307 -> claude-haiku-4-5-20251001 - claude-3-5-sonnet-20240620/20241022 -> claude-sonnet-4-5-20250929 - claude-3-7-sonnet-20250219 -> claude-sonnet-4-6 - gemini-1.5-flash -> gemini-2.5-flash - gemini-1.5-pro -> gemini-2.5-pro Explicitly preserved (not modernized): - llm_translation/ tests where model is the SUBJECT (provider-specific translation/transformation logic). Only the deprecated 20250514 references were already bumped in a prior commit. - Cost-calc / tokenizer subject tests in test_utils.py (skip-ranges documented by the sub-agent). - Bedrock model IDs in test_health_check.py path-stripping tests. - JSONL batch request bodies and mock LLM response bodies (must match upstream literal). - Langfuse expected-request-body JSON fixtures (cost values are exact- match-asserted; changing the model would shift response_cost). - gpt-3.5-turbo-instruct (text-completion endpoint; no modern OpenAI equivalent). - Top-level tests calling the proxy through user-facing aliases (gpt-3.5-turbo, gpt-4, text-embedding-ada-002, dall-e-3) — aliases in proxy_server_config.yaml stay; only the underlying model was bumped. - tests/test_gpt5_azure_temperature_support.py (the test's whole point is model-name handling). - Fake / mock / openai/fake identifiers. Notable side fixes: - test_spend_accuracy_tests.py: UPSTREAM_MODEL now matches what spend_tracking_config.yaml's proxy actually routes to (gpt-5-mini), resolving a latent inconsistency. - proxy_server_config.yaml: bare `gpt-5` alias renamed to `gpt-5.5` (bare gpt-5 is not a valid OpenAI alias). - test_batches_logging_unit_tests.py: explicit_models list entries kept distinct (gpt-5-mini + gpt-5.5) after bulk rename. test: fix CI failures from model modernization sweep CI surfaced 4 categories of regression from the bulk modernization: 1. Azure deployment names are customer-specific. Reverted: - tests/litellm_utils_tests/test_health_check.py: azure/text- embedding-3-small -> azure/text-embedding-ada-002 (the CI Azure account does not have a text-embedding-3-small deployment). - tests/logging_callback_tests/test_custom_callback_router.py: same revert for two router fixtures driving aembedding. 2. gpt-5 family does not accept temperature != 1. Tests that pass a custom temperature swapped from gpt-5-mini to gpt-4.1-mini (modern non-reasoning OpenAI mini that still accepts temperature/logprobs): - tests/logging_callback_tests/test_datadog.py - tests/logging_callback_tests/test_langsmith_unit_test.py - tests/logging_callback_tests/test_otel_logging.py 3. proxy_server_config.yaml's gpt-3.5-turbo-large alias was routing to gpt-5.5 (a reasoning model that rejects logprobs). The proxy test tests/test_openai_endpoints.py::test_chat_completion_streaming exercises logprobs/top_logprobs through that alias. Bumped the underlying model to gpt-4.1 (non-reasoning, still modern). 4. tests/logging_callback_tests/test_gcs_pub_sub.py asserts against a pinned JSON fixture (gcs_pub_sub_body/spend_logs_payload.json) with hardcoded model="gpt-4o" and a model-specific spend value. Reverted the litellm.acompletion calls in the test to model="gpt-4o" so the fixture's exact-match assertions still hold. 5. tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py: anthropic.messages.create routing to openai/gpt-5-mini returned an empty content[0] with max_tokens=100 (reasoning-token consumption). Swapped to openai/gpt-4.1-mini. * test: fix Assistants API model + 2 cursor[bot] review nits 1. pass_through_unit_tests/test_custom_logger_passthrough.py: gpt-5.5 isn't accepted by the /v1/assistants endpoint ("unsupported_model"). Switch to gpt-4.1-mini (modern, Assistants- API-supported, non-reasoning). 2. example_config_yaml/pass_through_config.yaml: the previous sweep bumped the claude-3-7-sonnet alias to claude-opus-4-7, which is a tier change (Sonnet -> Opus). Map to claude-sonnet-4-6 to keep the Sonnet tier intact. (Cursor bugbot review.) 3. example_config_yaml/simple_config.yaml: model_name was left as gpt-3.5-turbo while the underlying was bumped to gpt-5-mini, which muddles the "simple" example. Make both sides gpt-5-mini so the most basic example is a straight 1:1 mapping again. (Cursor bugbot review.) * fix: revert gpt-4/gpt-3.5-turbo alias underlying to non-reasoning models tests/test_openai_endpoints.py::test_completion calls the proxy alias "gpt-4" with temperature=0, and other tests call gpt-3.5-turbo with custom temperature / logprobs / the legacy /v1/completions endpoint. The earlier modernization mapped both aliases to gpt-5.5 / gpt-5-mini, which are reasoning models that reject temperature != 1 and don't expose /v1/completions. Map the aliases to gpt-4.1 / gpt-4.1-mini (modern non-reasoning OpenAI models) instead — keeps user-facing aliases preserved while picking a current underlying that still supports the parameters/endpoints the tests exercise. * test(proxy): isolate run_server CLI tests from prisma DB-setup path test_keepalive_timeout_flag and test_timeout_worker_healthcheck_flag were the only run_server tests in test_proxy_cli.py that neither stripped DATABASE_URL/DIRECT_URL nor mocked the prisma DB path. When a DATABASE_URL is present (CI/env leak), run_server --local enters the DB block and blocks in the un-timeout'd subprocess.run(["prisma"]) at proxy_cli.py:987 plus the ProxyExtrasDBManager migrate-deploy retry loops, ~370s per test on the CI runner. --dist=loadscope pins both to one xdist worker, so the proxy-infra job appears stuck at 99% and hits the 20-min timeout. Apply the same isolation every other run_server test in this file already uses: mock PrismaManager.setup_database + should_update_prisma_schema and strip DATABASE_URL/DIRECT_URL. Full module drops from 31.7s to 2.9s locally; both tests fall off the slow list. * feat: add OTEL GenAI latest-experimental semantic convention support (#27418) - Introduce `OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental` opt-in that switches OTEL traces to conform with the OpenTelemetry GenAI semantic conventions specification - Extract all semconv behavior into a new `OTELGenAISemconvMixin` class in `gen_ai_semconv.py`, mixed into `OpenTelemetry` to keep concerns separated - In semconv mode, span name follows `{operation} {model}` pattern (e.g. `chat gpt-4`) and span kind is set to `CLIENT` instead of legacy `litellm_request` - Replace `gen_ai.system` with `gen_ai.provider.name` and drop `llm.is_streaming` in semconv mode; add `gen_ai.request.{frequency_penalty,presence_penalty,top_k,seed,stop_sequences,stream,choice.count}` and `gen_ai.usage.cache_{creation,read}.input_tokens` attributes - Replace per-message `gen_ai.content.prompt` / per-choice `gen_ai.content.completion` log events with a single consolidated `gen_ai.client.inference.operation.details` event; omit `gen_ai.input/output.messages` when content capture is disabled - Suppress the non-standard `raw_gen_ai_request` child span entirely in semconv mode - Support both programmatic (`OpenTelemetryConfig.semconv_stability_opt_in` field) and environment variable activation; the two sources are unioned so either or both can enable the opt-in - Extract OTEL SDK `LogRecord` / `SeverityNumber` version-compatibility shim into a reusable `_otel_log_types()` static method to deduplicate the `< 1.39.0` / `>= 1.39.0` import branching - Add 30+ unit tests covering opt-in gating, span naming, attribute emission/omission rules, stop sequence normalization, cache token attributes, and the consolidated event lifecycle Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> * chore: retrigger CI * test(ci): add reasoning_effort grid v4 e2e regression suite Encode the 231-cell QA sweep (21 provider x model combos x 11 effort values) from #27039 / #27074 as an automated CircleCI-gated regression suite. Each cell hits the real provider endpoint, captures the outgoing wire body via a pre-call CustomLogger, and asserts: - thinking.type, output_config.effort, thinking.budget_tokens, max_tokens in the captured request body (regression signal for silent drops/strips in any provider transformation) - HTTP status (200 vs BadRequestError -> 400) returned by litellm (regression signal for clean-error vs leaked-500 mappings) The matrix is encoded as a small rule set keyed by (model_mode, effort) plus per-model xhigh/max capability overrides, then expanded across the five chat-completion routes (Anthropic direct, Azure AI Foundry, Vertex AI, Bedrock Converse, Bedrock Invoke /chat) and the Bedrock Invoke /v1/messages route. Cells skip at runtime when the route's provider env vars are absent, so PR builds without credentials no-op gracefully. Wired into CircleCI as the reasoning_effort_grid_v4_e2e job behind the existing main / litellm_* branch filter. * fix(reasoning_effort_grid_v4): cleanup unused fixture, parse converse body, guard budget tokens - Remove unused vertex_credentials_path fixture (and now-unused os import) from conftest.py. - Parse Bedrock Converse complete_input_dict (logged as a JSON string by converse_handler.py) before passing to _assert_cell, so dict accessors work uniformly across routes. - Extend _BUDGET_TOKENS with xhigh and max entries so the budget-mode branch in expected() cannot KeyError if a future budget model gains the matching cap. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(reasoning_effort_grid_v4): grant sonnet-4-6 entries the max-effort cap The runtime _validate_effort_for_model allows effort='max' for any Claude 4.6 model (opus or sonnet), and model_prices_and_context_window sets supports_max_reasoning_effort: true for claude-sonnet-4-6. The grid spec previously gave sonnet-4-6 entries _CAPS_NONE, so expected() returned status=400 for effort='max', which mismatched the runtime's status=200 and caused 6 cells (one per route) to fail. Rename _CAPS_OPUS_4_6 to _CAPS_4_6 (since the cap set is shared by opus and sonnet 4.6) and assign it to all sonnet-4-6 entries. Co-authored-by: Yassin Kortam <yassin@berri.ai> * refactor(tests): move reasoning_effort grid suite under llm_translation, drop v4 naming - Drop the "v4" suffix throughout: it referred to the QA sweep iteration, not this test suite. There's only one regression suite, so just call it reasoning_effort_grid. - Move tests/test… * Revert "merge main (#28629)" This reverts commit e4870f7bb68cf1dbc43b4bae9e5ee09b86aea481. * refactor(purview): remove unused logging_only constructor param The logging_only kwarg was passed in but only echoed in a log message; behavioral mode is driven by event_hook. Drop the dead parameter and the no-op derivation in the initializer. * fix(purview): log post-call success hook via @log_guardrail_information * fix(purview): raise HTTPException 400 on Responses API transform failure When the Responses API input cannot be transformed in blocking mode, raise an HTTPException with a clear 400 detail instead of re-raising the raw transformation exception. The latter would surface as a 500 and be indistinguishable from a backend provider failure; the former matches the docstring and the existing pre-call HTTPException patterns. * Fix Purview audit prompt extraction for Responses API logging hook The async_logging_hook receives kwargs == litellm_logging_obj.model_call_details, where function_setup mirrors the raw responses input under the 'messages' key (string or list of input items, not chat-format messages). The previous logic took the generic 'messages' branch first, calling get_prompt_text_for_dlp on data that is not in the chat-message format, and the responses-specific extractor was never reached. As a result, the prompt half of the Purview audit was silently skipped (or produced garbage text) for Responses API calls in logging_only mode. Check call_type first and route Responses API calls to _responses_api_input_to_str, which reads the original 'input' and 'instructions' keys that pre_call and update_environment_variables persist on model_call_details. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(purview_dlp): check call_type for responses before messages in pre-call hook Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(purview_dlp): don't reject empty-string prompts as token-id prompts completion_prompt_to_str returns None for both token-id lists and empty/whitespace-only strings (stripped). The previous check 'raw_prompt is not None and prompt_text is None' conflated these cases, raising the misleading 'Token-id completion prompts cannot be scanned' error for harmless empty-string prompts like {"prompt": ""}. Tighten the check to only reject true token-id prompts (non-empty list of ints). Empty/whitespace string prompts now fall through to the 'no prompt text → skip scan' path. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(purview_dlp): scan ResponsesAPI streaming completed event for DLP The streaming iterator hook previously routed all assembled streams through stream_chunk_builder, which only knows chat/text-completion deltas. Responses API streams emit typed events (response.created, response.completed, ...) whose final event carries the full ResponsesAPIResponse, so stream_chunk_builder would raise APIError or pass the assembled response through unchanged. Detect Responses API streaming chunks before the chat/text fallthrough and extract the assembled ResponsesAPIResponse from the latest response.completed (or response.failed / response.incomplete) event, then scan its output_text via the same _completion_response_text_parts path used by non-streaming. * fix(purview_dlp): fail closed on incomplete Responses API streams Previously, _assemble_responses_api_from_chunks returned None both when the stream was not a Responses API stream and when it was a Responses API stream but no final ResponsesAPIResponse-bearing event was received. The caller treated both cases identically and fell through to stream_chunk_builder, which does not understand Responses API events. Return a (is_responses_api_stream, assembled) tuple so the caller can fail closed with an accurate error when Responses API events were seen but no final response event arrived, instead of misrouting events to the chat chunk builder. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(purview_dlp): wrap upstream DLP errors as HTTPException(400) in blocking mode Previously a bare `raise` in `_check_content` re-propagated raw network / HTTP errors (e.g. httpx.HTTPStatusError, ConnectionError) to the client, which would surface as a 500. Now blocking-mode failures from the Graph `processContent` call (and OAuth token / protection-scopes calls) are converted to HTTPException(400) with a structured detail payload, while HTTPException instances raised by upstream layers continue to propagate unchanged. Logging-only mode is unaffected. * test(purview_dlp): cover HTTP error paths for token + Graph POST * fix(purview): include Responses API function_call arguments in DLP scan ResponsesAPIResponse.output_text only aggregates output_text content blocks and ignores function_call items, so sensitive data in model-generated tool-call arguments would bypass the DLP scan. Mirror the ModelResponse path by extracting function_call arguments explicitly from the output list. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(purview_dlp): scan ResponsesAPIResponse in stream_chunk_builder fallback If a Responses API stream slips past _assemble_responses_api_from_chunks (no chunks with type starting with 'response.') and stream_chunk_builder somehow returns a ResponsesAPIResponse, route it through _completion_response_text_parts instead of the 'not a ModelResponse' pass-through that would leak content unscanned. * fix(purview_dlp): preserve upstream Graph status on `_check_content` errors httpx.HTTPStatusError from Graph API (429, 503, etc.) was always wrapped as HTTPException(400), making rate-limits and infrastructure errors indistinguishable from a DLP policy block and stripping retry-after info. Now: - 429 and 5xx pass through with their original status code; the upstream Retry-After header is forwarded. - 401/403 (proxy-side credential/consent issue, not actionable by the client) map to 502 Bad Gateway. - A debug log makes the logging_hook -> async_logging_hook deferral observable so audit failures don't silently disappear if the framework stops dispatching async_logging_hook for some code path. * fix(purview_dlp): reject nested token-id completion prompts OpenAI /v1/completions accepts prompt: [[token, ids]] (multi-prompt token-id batches). The previous blocking-mode check only fired on a flat list[int], so nested or mixed token-id prompts skipped the Purview scan while the model still received the data. Extract the token-id detection into PurviewGuardrailBase.is_token_id_prompt and use it from the pre-call hook so every list shape Purview cannot decode fails closed. * fix(purview): drop caller-influenceable identity fallbacks in audit resolver Logging-only hook now resolves the Purview user from only the proxy-injected user_api_key_user_id (which mirrors UserAPIKeyAuth.user_id after the proxy strips caller-supplied user_api_key_* keys). Skipping the audit when no trusted identity is available prevents a caller from submitting metadata.user_id pointing at a victim's Entra object id and having their prompt/response sent to Purview under that user's identity. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Yuneng Jiang <yuneng@berri.ai> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> Co-authored-by: claude <claude@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Kenan Yildirim <kenan@kenany.me> Co-authored-by: vladpolevoi <vladp@lasso.security> Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Dennis Henry <dennis.henry@okta.com> Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: João Costa <13508071+jpv-costa@users.noreply.github.com> Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Michael-RZ-Berri <michael@berri.ai> Co-authored-by: Shivam Rawat <shivam@berri.ai> Co-authored-by: Tai An <antai12232931@outlook.com> Co-authored-by: Vincent <yimao1231@gmail.com> Co-authored-by: Kris Xia <xiajiayi0506@gmail.com> Co-authored-by: d 🔹 <liusway405@gmail.com> Co-authored-by: Fabrizio Cafolla <developer@fabriziocafolla.com> Co-authored-by: Tom Denham <tom@tomdee.co.uk> Co-authored-by: escon1004 <70471150+escon1004@users.noreply.github.com> Co-authored-by: Divyansh Singhal <97736786+Divyansh8321@users.noreply.github.com> Co-authored-by: robin-fiddler <robin@fiddler.ai> Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com> Co-authored-by: Felipe Rodrigues Gare Carnielli <felipe.gare@hotmail.com> Co-authored-by: harish-berri <harish@berri.ai> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Ryan <ryan@Ryans-MBP.localdomain> Co-authored-by: Claude (greptile subagent) <claude-greptile-bot@anthropic.com> Co-authored-by: TorvaldUtne <78661304+TorvaldUtne@users.noreply.github.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Isha <72744901+IshaMeera@users.noreply.github.com> Co-authored-by: cwang-otto <chengxuan.wang@ottotheagent.com> Co-authored-by: Roman Pushkin <roman.pushkin@gmail.com> Co-authored-by: boarder7395 <37314943+boarder7395@users.noreply.github.com> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com> Co-authored-by: Kevin Zhao <zkm8093@gmail.com> Co-authored-by: Matthew Lapointe <lapointe683@gmail.com> Co-authored-by: Elon Azoulay <elon.azoulay@gmail.com> Co-authored-by: afoninsky <andrey.afoninsky@gmail.com> Co-authored-by: Joseph Barker <156112794+seph-barker@users.noreply.github.com> Co-authored-by: Maruti Agarwal <88403147+marutilai@users.noreply.github.com> Co-authored-by: Cursor Bugbot <bugbot@cursor.com> Co-authored-by: Greptile <greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Greptile Reviewer <greptile-apps@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Felipe Garé <90070734+FelipeRodriguesGare@users.noreply.github.com> Co-authored-by: withomasmicrosoft <withomas@microsoft.com> Co-authored-by: Aditya Singh <60082699+adityasingh2400@users.noreply.github.com>	2026-05-22 15:59:04 -07:00
ryan-crabbe-berri	07bcd2c19e	test(e2e): forward LITELLM_LICENSE to UI e2e proxy (#28398 ) * test(e2e): forward LITELLM_LICENSE to UI e2e proxy The UI e2e job ran without LITELLM_LICENSE, so premium_user was always false in the issued login JWT and premium-gated UI surfaces (Team-BYOK Model switch, etc.) couldn't be driven through the UI. Forward the env var from run_e2e.sh and the CircleCI e2e_ui_testing job, and add a sanity test that decodes the admin storage state token and asserts premium_user=true so the wiring fails loudly if it ever regresses. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update ui/litellm-dashboard/e2e_tests/tests/proxy-admin/license.spec.ts Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-05-21 18:17:03 -07:00
Mateo Wang	8acf64e16c	fix(interactions): never drop streamed text deltas; always emit terminal completion (#28394 ) * fix(interactions): never drop streamed text deltas; always emit terminal completion The interactions streaming bridge had two bugs flagged by Greptile on PR #28153: 1. The first OutputTextDeltaEvent (and the second, when no ResponseCreatedEvent precedes the deltas) was consumed to emit a synthetic interaction.created / step.start event, but the chunk's text payload was never forwarded as a step.delta. The text only reappeared in the terminal step.stop, which defeats the purpose of incremental streaming. 2. When the upstream Responses API stream ended via StopIteration without a ResponseCompletedEvent, the iterator emitted step.stop but never the terminal interaction.completed event carrying the full collected text. This refactors the iterator to translate each upstream chunk into a list of events (instead of a single event) and buffers them in a deque. A text delta now expands into [interaction.created, step.start, step.delta] on the first chunk so no token is dropped, and the StopIteration / StopAsyncIteration fallback always flushes a terminal interaction.completed event when one hasn't already been sent. Both behaviors are covered by new unit tests: - test_no_text_token_is_dropped_during_streaming - test_response_created_then_text_delta_emits_step_start_and_delta - test_stop_iteration_fallback_emits_completion_event - test_response_completed_emits_stop_then_completion (no double-emit) Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(interactions): correlate EOF terminal events with stream's interaction id The StopIteration fallback path previously built the terminal step.stop / interaction.completed events with id=None (legacy content.stop) and a memory-address fallback string (interaction.completed), neither of which matched the item_id used by the earlier interaction.created / step.start / step.delta events in the same stream. Downstream consumers correlating events by id would see a mismatch. Persist the interaction id derived from the first upstream chunk (item_id on an OutputTextDeltaEvent, or response.id on a ResponseCreatedEvent) and reuse it when flushing the terminal events on EOF. Author: mateo-berri <277851410+mateo-berri@users.noreply.github.com> * ci(windows): raise UV_HTTP_TIMEOUT to 300s for uv sync The using_litellm_on_windows job has been hitting flaky PyPI download timeouts during 'uv sync --frozen --group dev' — different packages on each rerun (six, pydantic-core), all surfacing the same uv error: Failed to download distribution due to network timeout. Try increasing UV_HTTP_TIMEOUT (current value: 30s). uv's default 30s per-request timeout is too tight for the Windows runner on this project (50+ deps, several multi-MB wheels), so bump it to 300s to let slow individual downloads complete instead of failing the build. * fix(interactions): correlate ResponseCompletedEvent terminal events with stream's interaction id When a stream starts directly with OutputTextDeltaEvent (no preceding ResponseCreatedEvent), interaction.created carries item_id while interaction.completed previously carried response.id from ResponseCompletedEvent. The two ids can differ, leaving consumers that correlate events by id unable to match the start and completion events. Fall back to self._interaction_id (set on the first chunk that derives an id) before response.id, mirroring the EOF terminal path. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-20 16:41:40 -07:00
yuneng-jiang	62dca9e977	fix(ci): flag codecov uploads, enable carryforward, close coverage gaps (#28028 ) * fix(ci): flag codecov uploads and enable carryforward Coverage uploads from GHA and CircleCI were unflagged. Commits that receive the push-triggered workflows more than once (re-runs, or branches cut at the same SHA) accumulated many overlapping flagless sessions, and Codecov's per-commit merge dropped the largest, ubiquitously-imported files (router.py, proxy_server.py, main.py, utils.py, cost_calculator.py) from the report even though the uploaded XMLs contained them. - codecov.yaml: flag_management.default_rules.carryforward: true - GHA reusable bases: tag each upload with its workflow/shard name - CircleCI: tag the combined upload "circleci"; also combine the agent / google_generate_content_endpoint / litellm_utils datafiles that were produced and required but missing from the combine list * fix(ci): close coverage gaps in proxy-legacy, router-unit, auth-ui, caching-redis - test-unit-proxy-legacy: route through _test-unit-base so the full proxy_unit_tests suite (incl. comprehensive test_proxy_server.py) is measured and uploaded with per-group flags (was plain pytest, no --cov) - _test-unit-services-base: declare the enable-redis input + the six secrets test-unit-caching-redis passes; that workflow had a workflow_call signature mismatch and startup_failed on every push (never ran). Changes are additive/optional - proxy-db and security callers unchanged - circleci: add --cov + persist + combine + upload-coverage requires for litellm_router_unit_testing (tests/router_unit_tests) and auth_ui_unit_tests (tests/proxy_admin_ui_tests); neither was covered anywhere. Redundant -k subset jobs left as-is (local_testing covers them) fix(ci): remove dead GHA Redis workflow; keep Redis on CircleCI only CircleCI redis_caching_unit_tests already runs the exact same files (tests/local_testing/test_dual_cache.py, test_redis_batch_optimizations.py, test_router_utils.py) with --cov, and that datafile is already combined and uploaded. The GHA test-unit-caching-redis workflow was redundant and had never run (workflow_call signature mismatch -> startup_failure on every push). - Delete .github/workflows/test-unit-caching-redis.yml - Revert _test-unit-services-base.yml to the flag-fix state (drop the enable-redis input / secrets / env wiring added only to prop up the GHA Redis workflow); the verified per-upload flags line is kept - The only single-star "litellm_" branch glob lived in the deleted file; no other single-star globs exist, so none remain to widen fix(ci): keep proxy-legacy as a standalone job to preserve required check names Routing proxy-legacy through the reusable workflow renamed each check from the bare matrix name (e.g. "proxy-response-and-misc") to "proxy-response-and-misc / Run tests". Those bare names are required status checks in branch protection, so the old contexts never reported and PRs sat "Expected — Waiting for status to be reported" indefinitely. Restore the original standalone matrix job (job name == matrix name, so the required contexts report again) and add coverage in place: --cov on pytest plus an OIDC Codecov upload flagged proxy-legacy-<group>. Net effect of the gap-#2 fix is preserved (flagged coverage for tests/proxy_unit_tests/*) without changing any check name. revert(ci): drop all proxy-legacy changes from this PR tests/proxy_unit_tests/** is already fully covered by test-unit-proxy-db (its shard-coverage guard fails CI if any file in that dir is unassigned), which this PR already flags + carryforwards. Adding --cov and id-token:write to the legacy pull_request job was redundant and put OIDC on a job that runs untrusted PR code. Restore the file to the base version verbatim so this PR no longer touches proxy-legacy at all (also restores its original required check names). Retiring proxy-legacy in favor of proxy-db on pull_request is a separate effort that needs a branch-protection change.	2026-05-16 10:56:32 -07:00
Yuneng Jiang	538092a55f	ci: use --cov=./litellm so coverage paths resolve unambiguously in Codecov pytest-cov treats --cov=<module-name> as a Python package and emits XML paths relative to the package root, stripping the litellm/ prefix (`proxy/proxy_server.py` instead of `litellm/proxy/proxy_server.py`). Codecov's auto-prefix heuristic then drops every file whose basename is ambiguous in the repo — `proxy_server.py` (3 copies under enterprise/), `router.py` (2 copies), `utils.py` (20+), `main.py` (20+), `constants.py` (2). The 11 highest-fix-rate hotspots have never appeared in Codecov. Switching to --cov=./litellm treats the argument as a path, which makes coverage.xml emit repo-relative paths (`litellm/proxy/proxy_server.py`). Each path is unambiguous, so Codecov resolves all files correctly. Verified locally: rerunning a single proxy_unit_tests test with --cov=./litellm produced `filename="litellm/proxy/proxy_server.py"`, `filename="litellm/router.py"`, and `filename="litellm/types/router.py"` as distinct entries — exactly the disambiguation Codecov needs. Touches every workflow that uploads coverage: the two reusable GHA workflows (_test-unit-base.yml, _test-unit-services-base.yml), test-mcp.yml, and all 14 invocations in .circleci/config.yml.	2026-05-14 14:01:05 -07:00
Mateo Wang	fdaa288607	ci(circleci): enable Rerun Failed Tests for all pytest jobs (#27155 ) * ci(circleci): enable Rerun Failed Tests for all pytest suites Migrated every pytest-based CircleCI job that uploads JUnit results to use 'circleci tests run' instead of invoking pytest directly. This is the prerequisite for CircleCI's 'Rerun failed tests' feature to be available on each job in the pipeline. For each job: - Glob test files via 'circleci tests glob' and pipe them into 'circleci tests run --command="xargs ... pytest ..."' so the agent can feed the failed-test subset on rerun. - Preserve all original pytest flags (parallelism, timeouts, retries, coverage, junit output paths). - For jobs that previously lacked 'store_test_results' (proxy spend accuracy, proxy_build_from_pip, db_migration_disable_update_check), add the step so JUnit XML is uploaded and rerun is actually wired up. - Replace the dynamic IGNORE_DIRS shell array in llm_translation_testing with a 'grep -v' filter on the glob output, matching the previous behavior of skipping tests/llm_translation/realtime. - For 'build_and_test', glob 'tests/test_.py' (top-level only) which matches the prior 'tests/.py' shell glob; the long list of '--ignore=tests/<subdir>' flags was vestigial and is dropped. Jobs already using 'circleci tests run' (local_testing_part1/2, litellm_router_testing) are unchanged. * fix(ci): convert classnames to file paths on rerun CircleCI's Rerun Failed Tests sends each previously failed test as a JUnit classname (e.g. 'tests.otel_tests.test_key_logging_callbacks'), but pytest needs a file path. Without the awk preprocess step, rerun runs fail with 'file or directory not found'. Mirror the awk transform that local_testing_part1, local_testing_part2, and litellm_router_testing already use, so rerun works in every job that this PR migrated to 'circleci tests run'. * ci: drop -x from OTEL pytest run so all failures are reported --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com>	2026-05-05 17:27:09 -07:00
Yuneng Jiang	19ad964c4a	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/vigorous-albattani-2b7480	2026-05-04 21:19:34 -07:00
Yuneng Jiang	c1c0506d2c	[Perf] CI: Skip Redundant Playwright Apt Install in E2E UI Job The cimg/python:3.12-browsers base image already ships every Chromium system dependency Playwright needs (libnss3, libatk-bridge2.0-0, libcups2, etc. — the install log shows them all as "already the newest version"). Passing --with-deps to `npx playwright install` therefore runs an apt-get update + install for nothing, but pays the full cost of hitting Ubuntu mirrors. On a recent run those mirrors stalled hard: apt-get update alone took 6m53s at 81.5 kB/s with several archives returning connection refused. Drop --with-deps and persist ~/.cache/ms-playwright alongside node_modules so the Chromium binary is also reused across runs. Bump the cache key to v2 so the existing v1 entry (which only contained node_modules) is not loaded and skipped over the new browser path.	2026-05-04 21:19:31 -07:00
Yuneng Jiang	0976fbc6c4	[Fix] Tests: Restore /metrics access for prometheus test suite /metrics now requires auth by default; tests/otel_tests/test_prometheus.py makes 4+ unauthenticated GETs against http://0.0.0.0:4000/metrics, so every prometheus test in CI now fails the metric assertion. Set require_auth_for_metrics_endpoint: false in otel_test_config.yaml to opt out for this test job, which scrapes /metrics directly. Verified locally: 8/8 prometheus tests green (one flaky retry on test_proxy_success_metrics that pre-dates this PR). Also drop the -x stop-on-first-failure flag from the otel test command so all failures in the job surface in a single CI run rather than hiding behind whichever one trips first.	2026-05-04 20:54:54 -07:00
Yuneng Jiang	727ab8dcc4	[Fix] Proxy: Break managed-resources import cycle on Python 3.13 The Python 3.13 CCI smoke matrix surfaces a partially-initialized-module ImportError when loading the managed files hook chain: litellm.proxy.hooks/__init__ (mid-import) -> enterprise.enterprise_hooks -> litellm_enterprise.proxy.hooks.managed_files -> litellm.llms.base_llm.managed_resources.isolation -> litellm.proxy.management_endpoints.common_utils -> litellm.proxy.utils (re-enters litellm.proxy.hooks) The except ImportError block in hooks/__init__.py silently swallowed the failure, leaving managed_files unregistered and POST /files returning 500 "Managed files hook not found". Two-layer fix: - Inline the 3-line _user_has_admin_view check in isolation.py instead of importing it from litellm.proxy.management_endpoints.common_utils. litellm.llms.* should not depend on litellm.proxy.* — removing this layering violation breaks the cycle at its root. - Define PROXY_HOOKS and get_proxy_hook before the conditional enterprise import in litellm/proxy/hooks/__init__.py, so any future re-entry resolves the public names instead of hitting an ImportError on a partially-initialized module. Also fold in two unrelated CCI repairs surfaced in the same staging run: - tests/otel_tests/test_key_logging_callbacks.py: per-key gcs_bucket_name / gcs_path_service_account are now stripped by initialize_dynamic_callback_params, so the GCS client falls through to the env-only branch. Update the assertion to match the new "GCS_BUCKET_NAME is not set" message. - .circleci/config.yml: tests/pass_through_tests now resolves google-auth-library@10.x via the @google-cloud/vertexai 1.12.0 bump, which uses dynamic ESM imports Jest 29 cannot load without --experimental-vm-modules. Pass that flag in the Vertex JS test step. Adds tests/test_litellm/proxy/hooks/test_proxy_hooks_init.py as a regression guard: managed_files / managed_vector_stores must register, and isolation.py must not transitively import litellm.proxy.utils.	2026-05-04 20:05:24 -07:00
Mateo Wang	82dacfb746	Merge pull request #26461 from BerriAI/litellm_fix_circleci_rerun fix(ci): support CircleCI rerun failed tests for local_testing jobs	2026-04-27 13:26:42 -07:00
mateo-berri	68d4420233	fix(ci): strip trailing class segment from JUnit classnames before pytest Pytest tests inside a class produce JUnit XML classnames like 'tests.local_testing.test_file_types.TestFileConsts' (module + class). The previous awk preprocessor would convert this to 'tests/local_testing/test_file_types/TestFileConsts.py', which doesn't exist, causing pytest to collect 0 items on rerun. Strip a trailing '.<UppercaseSegment>' before the dot-to-slash conversion. Module path segments are lowercase (test files start with 'test_'), and the class name is the only segment beginning with an uppercase letter, so this is unambiguous. Verified affected files in tests/local_testing/: test_file_types.py (TestFileConsts), test_gcs_cache_unit_tests.py, test_disk_cache_unit_tests.py, test_docker_no_network_on_deploy.py, test_sagemaker_nova_integration.py, test_cache_preset_key.py.	2026-04-24 16:42:21 -07:00
mateo-berri	ed0a965208	fix(ci): convert dot-notation test paths to file paths for CircleCI rerun CircleCI's 'Rerun failed tests' feature passes test identifiers from the JUnit XML classname attribute (dot notation, e.g. 'tests.local_testing.test_router') via stdin. pytest receives these paths and collects 0 items, causing the rerun to exit 123 with no tests run. Add an awk preprocessor before xargs that detects dot-notation module paths and converts them to file paths (tests/local_testing/test_router.py). File paths already containing '.py' are passed through unchanged. Applied to all three jobs using the 'circleci tests run' + 'xargs pytest' pattern: local_testing_part1, local_testing_part2, and the router test job.	2026-04-24 13:40:02 -07:00
shin-berri	8e652d129d	Merge pull request #26356 from BerriAI/litellm_cci_gha_dedup_and_shard [Infra] Remove CCI/GHA test duplication and semantically shard proxy DB tests	2026-04-23 18:17:56 -07:00
shin-berri	7c69262279	Merge pull request #26349 from BerriAI/litellm_deflakeSpendTests [Fix] Deflake spend tracking tests	2026-04-23 16:12:19 -07:00
Yuneng Jiang	c2f40e89d5	[Infra] Remove CCI/GHA test duplication and semantically shard proxy DB tests Split into two related cleanups: 1. Delete CCI jobs that duplicate GHA coverage: - mcp_testing (tests/mcp_tests) — already run by test-mcp.yml - litellm_mapped_tests_proxy_part1/part2 (tests/test_litellm/proxy) — already run across test-unit-proxy-auth.yml, test-unit-proxy-endpoints.yml, and test-unit-proxy-infra.yml Add rag_endpoints and realtime_endpoints to test-unit-proxy-endpoints.yml (they were only covered by the deleted CCI part2 job). Remove the corresponding workflow wiring, coverage combine entries, and upload-coverage dependencies in .circleci/config.yml. 2. Re-shard test-unit-proxy-db.yml from 4 alphabetic buckets to 8 semantic ones (auth-and-jwt, proxy-server, logging-and-callbacks, db-and-spend, guardrails-budget-hooks, endpoints-and-responses, plus the existing serial key-generation and test_proxy_utils.py shards). New test files are placed in whichever group they belong to instead of reshuffling slices. Add a dist input to _test-unit-services-base.yml so the test_proxy_utils.py shard can use --dist=worksteal to spread its ~64 (many parametrized) functions across workers; the default --dist=loadscope pins a single file to a single worker, which was the root cause of that shard running 10m+.	2026-04-23 14:48:38 -07:00
Yuneng Jiang	4af2b67357	[Fix] Drop orphan teardown step from Greptile merge Previous commit from greptile-apps added a new `when: always` teardown step without removing the prior `name:`-only step, leaving a `- run` block with no `command:` — CircleCI config validation rejects that. Collapse back to a single teardown step that runs on success and failure.	2026-04-23 14:21:14 -07:00
yuneng-jiang	8adb3a6a8f	Apply suggestion from @greptile-apps[bot] Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-04-23 14:18:06 -07:00
Yuneng Jiang	e37d1b0cb6	[Fix] Deflake spend tracking tests Two independent deflakes: 1. test_ui_view_spend_logs_unauthorized (unit) was returning 400 instead of 401/403 when earlier tests in the file left proxy-auth globals (prisma_client, master_key, user_custom_auth, general_settings, user_api_key_cache) in a state that let invalid tokens pass auth and fall through to the endpoint's own start_date/end_date validation. Add an autouse fixture that pins those globals to their import-time defaults for every test in the file. Harden the assertion to include response body so future flakes are diagnosable. 2. test_basic_spend_accuracy (CI job proxy_spend_accuracy_tests) depends on the Redis transaction buffer flushing spend to Postgres. The buffer uses a single global pod-lock key (cronjob_lock:db_spend_update_job) and a single global buffer list key. Pointing the proxy at the shared remote Redis means concurrent CI pipelines contend for the same lock and can drain each other's buffer into the wrong database. Add a start_redis reusable command that boots a per-job redis:7-alpine container (digest-pinned), and switch proxy_spend_accuracy_tests to REDIS_HOST=host.docker.internal:6379 so lock and buffer state are isolated per CI run.	2026-04-23 14:13:55 -07:00
Yuneng Jiang	03a022436b	[Infra] CCI: run RVM install from its own checkout dir The rvm/install script sources scripts/functions/installer using paths relative to the caller's working directory (not $0), so invoking /tmp/rvm/install from /home/circleci/project fails with 'No such file or directory'. Switch to (cd /tmp/rvm && ./install).	2026-04-22 21:50:53 -07:00
Yuneng Jiang	eb6a2d043c	[Infra] CCI: pin Ruby and Node.js installs in proxy_pass_through_endpoint_tests Align the Ruby, Node.js, and npm install path with the rest of the config. Three separate upstream installers were being invoked via \`curl ... \| bash\` or unlocked \`npm install\`: - RVM's \`get.rvm.io/stable\` installer (mutable upstream script). Replace with a shallow git clone of the rvm/rvm repo at tag 1.29.12 and verify HEAD matches the published commit SHA before running the local \`./install\` script. Same pattern already used for the helm-unittest plugin in .github/workflows/helm_unit_test.yml. - NodeSource's \`deb.nodesource.com/setup_18.x\` piped into sudo bash. Replace with a direct download of the Node.js 18.20.8 linux-x64 tarball from nodejs.org, verified against the published SHASUMS256.txt digest before extraction. - \`npm install @google-cloud/vertexai @google/generative-ai\` and \`--save-dev jest\` resolved fresh from the npm registry on every run. Add \`tests/pass_through_tests/package.json\` with pinned direct-dep versions and commit the generated package-lock.json, then switch CI to \`npm ci\` (exact lockfile install, fails on drift). Also scopes the Ruby+JS test runners to \`tests/pass_through_tests/\` so they pick up the committed package.json rather than writing node_modules at repo root.	2026-04-22 21:28:42 -07:00
Yuneng Jiang	a12a2190d7	[Infra] Flip remaining CI jobs to Python 3.12 Stragglers from the 2026-04-21 Python 3.12 standardization: - .github/workflows/check_duplicate_issues.yml (was 3.11) - .github/workflows/llm-translation-testing.yml (was 3.11) - .github/workflows/scan_duplicate_issues.yml (was 3.13) - .circleci proxy_build_from_pip_tests (was 3.13) The only intentional non-3.12 CI job is installing_litellm_on_python_3_13, which exists as an explicit "latest supported Python" smoke matrix.	2026-04-22 21:26:19 -07:00
Yuneng Jiang	547d60c642	[Infra] CCI: match Windows uv install path to Linux verification pattern The Windows uv install step was piping a remote install.ps1 into Invoke-Expression without any integrity check, while the Linux install steps (install_uv command, line 89) download to a file, verify SHA-256 against a hardcoded digest, and only then execute. Bring the Windows path to the same pattern. Also hardcode the kubectl v1.31.4 checksum in helm_chart_testing instead of fetching kubectl.sha256 from the same origin as the binary — if dl.k8s.io were ever to serve a tampered pair, a co-hosted checksum provides no additional integrity.	2026-04-22 21:25:22 -07:00
Yuneng Jiang	44362cb167	[Infra] CCI: factor repeated filters and Python docker image to YAML anchors The same branch filter block appeared 46 times in the workflow declaration: filters: branches: only: - main - /litellm_./ And the same pinned Python docker image appeared 29 times in jobs: - image: cimg/python:3.12@sha256:9c796c... auth: username: ${DOCKERHUB_USERNAME} password: ${DOCKERHUB_PASSWORD} Replace with YAML anchors declared at first use: - `&main_branches` on using_litellm_on_windows's filters block; all other job entries reference it as `filters: main_branches`. - `&python312_image` on local_testing_part1's first docker image entry; all other jobs reference `- *python312_image`, including the multi-image jobs (auth_ui_unit_tests, installing_litellm_on_python_v2_migration_resolver) which keep their postgres sidecar entry inline afterwards. Net result: one place to change when the image digest rolls or the branch-filter convention changes. No behavior change — YAML anchor resolution produces identical config at parse time. Also adds Docker Hub auth block to upload-coverage (previously pulled anonymously). No functional difference for a public image, but avoids Docker Hub rate limits now that we reuse the same entry.	2026-04-22 21:24:06 -07:00
Yuneng Jiang	bea872a034	[Infra] CCI: remove dead steps accumulated across jobs Clean out copy-paste debug and workaround lines that serve no purpose: - `pwd && ls` echoes at the top of 30 "Run tests" steps (CCI already logs working_directory on every step). - "Show git commit hash" in local_testing_part1/part2 and langfuse_logging_unit_tests (CCI shows the SHA in every job header). - "Verify Docker is available" stubs in 6 machine-executor jobs (machine executors always have Docker). - `sudo systemctl restart docker` in proxy_store_model_in_db_tests (one-off workaround; not used anywhere else). - Duplicated Black formatting step in local_testing_part1 and local_testing_part2 — Black runs in the lint job, no reason to run it again here. - Second back-to-back `helm test litellm --logs` invocation in helm_chart_testing (one call is enough). No behavior change — these are all log-only or no-op steps.	2026-04-22 21:18:16 -07:00
Yuneng Jiang	28e1d2f1a6	[Infra] CCI: unify uv cache key and cache only ~/.cache/uv Consolidate 6 distinct cache-key prefixes (v2-dependencies-, v1-router-testing-deps-, v1-router-unit-deps-, v1-llm-translation-deps-, v1-llm-responses-deps-, v3-litellm-uv-deps-, ui-e2e-py-deps-v2-) onto a single v1-uv-cache-<uv.lock checksum> key shared across all Python jobs. Cache only ~/.cache/uv (the content-addressed uv download cache, hash-verified against uv.lock at install time). Drop ./.venv, ~/.local/{bin,lib}, and /home/circleci/.{pyenv,local} from cache paths. ~/.cache/uv is the only path uv sync needs to avoid re-downloading from PyPI; everything else is rebuilt each run from that verified cache. Remove partial-prefix restore-keys fallbacks — cache either hits exactly on the uv.lock hash or rebuilds cleanly. First run after merge will cold-miss on the new key; subsequent runs hit the unified cache.	2026-04-22 16:14:24 -07:00
shin-berri	b6fdd46636	Merge pull request #26270 from BerriAI/litellm_/lucid-kowalevski-de832f [Fix] Stabilize flaky spend accuracy tests + patch Redis buffer data-loss path	2026-04-22 15:02:24 -07:00
Yuneng Jiang	5445297da9	[Fix] Stabilize flaky spend accuracy tests with local ground truth Replace the calibration step (one request + 10-minute poll) with an independent ground truth computed from response usage via litellm.cost_per_token. All N requests are made up front, so a single dropped Redis write no longer kills the test. Add /health/readiness checks at test start and on poll timeout so the failure message surfaces proxy state (db, cache) instead of "calibration timed out". Set PROXY_BATCH_WRITE_AT=2 in the spend tracking CI job to shorten the scheduler flush window.	2026-04-22 13:45:00 -07:00
Yuneng Jiang	1b74c35b89	[Infra] Move non-API-key CCI jobs to GitHub Actions Principle: GHA handles work that doesn't need external API keys; CCI stays for integration tests that hit real API endpoints. Four CCI jobs moved to new or extended GHA workflows: 1. check_code_and_doc_quality (was 25 runs: ruff + import-safety + 21 code_coverage_tests + 3 documentation_tests + circular-imports). - The 21 tests/code_coverage_tests/.py scripts and the 3 tests/documentation_tests/.py scripts run in the new .github/workflows/test-code-quality.yml workflow. - ruff, import-safety, and circular-imports were already run by .github/workflows/test-linting.yml — no new migration needed. - The 3 documentation_tests scripts read docs/my-website/docs/proxy/config_settings.md. Since docs have moved to BerriAI/litellm-docs, the GHA workflow checks out that repo and symlinks docs/my-website -> the checkout so the existing hardcoded paths resolve without touching the scripts. The stale local docs/my-website/ copy in this repo will be removed in a separate PR. 2. semgrep (custom-rule SAST against .semgrep/rules). - New .github/workflows/test-semgrep.yml. 3. installing_litellm_on_python + installing_litellm_on_python_3_13 (pip install compat checks on Python 3.12 and 3.13). - New .github/workflows/test-install-litellm.yml as a matrix job. - 3.12 run also verifies litellm_enterprise import; 3.13 run skips that check (matches previous CCI behavior). - installing_litellm_on_python_v2_migration_resolver stays in CCI because it requires a postgres service. CCI .circleci/config.yml: -112 lines, 4 jobs and their workflow refs removed.	2026-04-22 13:38:00 -07:00
Yuneng Jiang	61fd4e985e	[Infra] CCI config cleanup — dead step, filter dupe, cache keys, machine image Follow-up cleanup after an independent review pass surfaced a few loose ends: - Delete a 6x-duplicated filter block in litellm_mapped_tests_proxy_part2 (same kind of copy-paste residue we fixed earlier in langfuse_logging_unit_tests). - Delete the empty "Install Semgrep" run step in the semgrep job — the command body was empty because semgrep is installed on-demand via uv tool run in the next step. - Standardize machine-executor image: one job was on ubuntu-2204:2023.10.1 while build_docker_database_image was already on ubuntu-2204:2024.04.1. Bumped everything to 2024.04.1. - Remove the legacy "version: 2" inside the workflows: block — CircleCI 2.1 top-level already declares the version. - Drop `{{ checksum ".circleci/config.yml" }}` from cache keys (13 sites). It was busting the cache on every unrelated config edit; the uv.lock checksum alone is the right dependency cache key. - Add partial-restore fallbacks to every restore_cache with a single templated key (10 sites). Jobs now fall back to the latest cache with a matching prefix if the exact uv.lock hash isn't cached yet. Net: -14 lines.	2026-04-21 23:31:01 -07:00
Yuneng Jiang	0a65d2c535	[Infra] Standardize default Python to 3.12 and remove miniconda setup Docker-executor jobs: - Consolidate base images on cimg/python:3.12. Jobs previously on 3.11 (26 jobs), 3.9 (1 historical: upload-coverage), and an incidental 3.13.1 (litellm_assistants_api_testing) now use 3.12. - installing_litellm_on_python_3_13 keeps cimg/python:3.13.1 as its explicit "latest Python supported" install-check matrix job. Machine-executor jobs: - Delete the miniconda install step from 10 jobs. uv now manages Python directly: uv sync --python 3.12 auto-downloads a python-build-standalone interpreter if the ubuntu-2204 base image's default python doesn't match. - Remove 37 "if [ -f conda.sh ]; then conda activate myenv" wrappers and 2 unconditional conda activate blocks left behind from the conda days. - proxy_build_from_pip_tests keeps its 3.13 target (it was conda create -n myenv python=3.13) via uv sync --python 3.13. Net: -301 lines.	2026-04-21 23:19:21 -07:00
Yuneng Jiang	344be27e83	[Refactor] Add start_postgres reusable command and migrate call sites Add a start_postgres command parameterized on db_name (default circle_test) that runs the postgres-db container and waits for port 5432 to accept connections. Replace all 11 inline docker run / wait_for_service blocks with a single - start_postgres call. The helm chart test overrides db_name to litellm_test; everything else uses the default. One of the 11 sites previously used a bespoke pg_isready loop instead of wait_for_service; it now goes through the same TCP-probe path everyone else uses, which is sufficient for test ordering purposes. Net: -112 lines.	2026-04-21 23:14:46 -07:00
Yuneng Jiang	f490340a52	[Refactor] Add install_uv reusable command and migrate all call sites Add a single install_uv command in the commands: section that encodes the uv version (0.10.9) and its SHA256 in one place, then replace all 42 inline curl\|sha256\|install blocks across every job that needs uv. setup_litellm_test_deps now calls install_uv too, so the shared test-dep bootstrap goes through the same path. Bumping uv version or SHA is now a one-line change instead of 43. Net: -203 lines.	2026-04-21 23:13:42 -07:00
Yuneng Jiang	439bbd223b	[Infra] Clean up unused CCI jobs and pin docker images by digest - Remove mypy_linting job (GHA test-linting.yml already runs this) - Remove three redundant "Install curl" apt-get steps (curl is already present on the ubuntu-2204 machine image and used successfully earlier in each affected job) - Dedupe langfuse_logging_unit_tests filter block (6x copy of the same two branch filters collapsed to 1) - Pin all docker image references by @sha256 digest so builds stay reproducible when upstream tags are updated: cimg/python:3.9, 3.11, 3.12, 3.12-browsers, 3.13.1, cimg/node:20.19, cimg/postgres:16.0, and postgres:14 used via docker run Net: -62 lines, 49 image references pinned.	2026-04-21 23:09:41 -07:00
Yuneng Jiang	ee550e1949	[Test] CI: add v2 migration resolver coverage with local Postgres Adds end-to-end CI coverage for `--use_v2_migration_resolver` via a new job `installing_litellm_on_python_v2_migration_resolver`: - Clones the pytest smoke path from `installing_litellm_on_python` but uses a local Postgres sidecar instead of the shared DB to prevent collisions with the v1 variant. - Runs only the new `test_litellm_proxy_server_config_no_general_settings_v2_resolver` which spawns the proxy with `--use_v2_migration_resolver` and smoke-tests `/health/liveliness` and `/chat/completions`. Refactors `test_basic_python_version.py`: - Extracts the proxy spawn + smoke-test body into `_run_proxy_server_smoke_test` so the v1 and v2 tests share the same code path. - The existing `test_litellm_proxy_server_config_no_general_settings` is now a thin wrapper that passes no extra args (v1 default, unchanged). - Adds `..._v2_resolver` variant that passes `--use_v2_migration_resolver`. The existing `installing_litellm_on_python` / `installing_litellm_on_python_3_13` jobs filter out the v2 variant via `-k "not v2_resolver"` so they keep running only against their shared DB, unchanged behavior.	2026-04-21 14:40:11 -07:00
Yuneng Jiang	0f5d503169	fix(ci): make e2e_ui_testing actually test the freshly built UI bundle The Build UI from source step used: cp -r out/ ../../litellm/proxy/_experimental/out/ GNU cp (CircleCI's Ubuntu image, coreutils 8.32) interprets this as copy the source directory as a CHILD of the destination when the destination already exists — so the command silently created litellm/proxy/_experimental/out/out/ instead of replacing the served bundle at litellm/proxy/_experimental/out/. The proxy continued serving whatever bundle was checked in, so every e2e_ui_testing run between this job's introduction (`d09d98a70a`, 2026-04-08) and the bundle-rebuild commit (`de790fd273`, 2026-04-18) was effectively testing a STALE bundle — not the fresh build. That is why the double-prefix regression (NEXT_PUBLIC_BASE_URL="ui/" combined with networking.tsx reading the env var) was never caught in CI even though the source contained the trigger the whole time: the bundle the proxy served never picked up the source change. Replace cp -r with rm + mv so the destination is cleanly swapped. Verified end-to-end on an Ubuntu 22.04 / GNU coreutils 8.32 container: - Before fix: fresh build has 9 "ui/" literals in chunks; after cp, _experimental/out/ still has 0 (stale); _experimental/out/out/ is a nested dir the proxy does not serve. - After fix: _experimental/out/* has 9 "ui/" literals — the proxy now serves the freshly built (broken, in this repro) bundle, so globalSetup fails at login and every spec is blocked. Removing the bug from .env.production and rebuilding brings the count back to 0 and the suite passes. No spec changes, no fixtures, no new infrastructure. The existing Playwright suite already catches this class of regression via the login flow in globalSetup; it just needs the CI to actually hand it the freshly built bundle.	2026-04-20 22:09:54 -07:00
Yuneng Jiang	bb62099323	[Fix] CI - auth_ui_unit_tests: use Postgres sidecar instead of shared DB Run auth_ui_unit_tests against a per-job cimg/postgres:16.0 sidecar with DATABASE_URL pointing at localhost:5432, matching the pattern used by e2e_ui_testing. Seed the schema via 'litellm --skip_server_startup --use_prisma_db_push' so each run starts on a clean DB with the current schema.prisma.	2026-04-20 16:22:10 -07:00
Yuneng Jiang	f24c8dbf79	chore: bump CircleCI conda envs from python 3.9 to 3.10 Six CI jobs create a miniconda env with python=3.9 before installing the project; these jobs now fail resolution because the project requires-python is >=3.10. Bump the conda env python to 3.10 to match the new floor.	2026-04-18 13:00:03 -07:00
Yuneng Jiang	ebac729146	[Infra] CI: reduce llm_translation_testing parallelism and tolerate worker restarts Workers in llm_translation_testing have been crashing mid-run with "Not properly terminated" (OOM), even after bumping resource_class to xlarge. Reduce xdist workers from 8 to 4 to lower peak memory, and add --max-worker-restart=5 so a crashed worker is replaced instead of failing the whole run.	2026-04-16 13:10:22 -07:00
shin-berri	65717add14	Merge pull request #25887 from BerriAI/litellm_/vigilant-cannon [Infra] Bump llm_translation_testing resource class to xlarge	2026-04-16 11:53:52 -07:00
Yuneng Jiang	72ba880905	[Infra] Bump llm_translation_testing resource class to xlarge	2026-04-16 11:50:55 -07:00
Yuneng Jiang	55f2a898be	[Infra] Remove unused publish_proxy_extras and prisma_schema_sync jobs publish_proxy_extras is superseded by PyPI trusted publishing (OIDC); the CircleCI project no longer has PYPI_PUBLISH_* credentials configured. prisma_schema_sync is a leaf smoke test with no dependents, and db push against the current schema is already exercised by e2e_ui_testing.	2026-04-15 16:43:30 -07:00
joereyna	a01cf44c35	fix: remove non-existent litellm_mcps_tests_coverage from coverage combine	2026-04-14 18:59:25 -07:00

1 2 3 4 5 ...

836 Commits