33c363d4d4
836 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
33c363d4d4
|
Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic (#29847)
* test(ci): extend record/replay proxy to chat, embeddings, moderations, rerank, anthropic The record/replay proxy that took the gpt-image-1 spend E2E off the live OpenAI path now fronts every provider, so the other real-provider E2Es stop paying for and depending on live calls each commit. It keys per upstream and selects a non-OpenAI provider by a /__recorder_upstream/<host>/ path prefix carried on the model's api_base, since some litellm handlers (cohere rerank) drop custom request headers. Wired into build_and_test (chat, embeddings, moderations, image), the otel job (cohere rerank), and the anthropic-messages job via a reusable start_openai_record_replay_proxy command. Dropped the time.time()/uuid prompt cache-busters in the build_and_test chat tests, whose config has the response cache off, so identical requests are recordable. The image spend test now asserts a repeat call still bills spend, failing loudly if the proxy response cache is ever turned on. Responses, the anthropic passthrough, bedrock, and fake-endpoint tests are left live: their lifecycles, api_base assertions, providers, or fake targets make a stateless body-keyed cache either break them or add nothing. * docs(ci): note the recorder command's OpenAI default upstream and prefix override Addresses a review note: the shared start_openai_record_replay_proxy command defaults the upstream to OpenAI, so a non-OpenAI model must carry the /__recorder_upstream/<host>/ prefix on its api_base. Document that in the command description so a future caller does not assume the default follows the provider. |
||
|
|
84247d954d
|
test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound (#29787)
* test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound The dockerized spend test test_key_info_spend_values_image_generation curls the proxy for a gpt-image-1 image, which wildcard-routes to real api.openai.com on every commit; an OpenAI outage then reddens unrelated PRs and each run pays for an image. Add an in-repo record/replay reverse proxy (tests/_openai_record_replay_proxy.py) that sits between the proxy and OpenAI. The first run, and the first after the recording lapses, records live; subsequent runs replay from the shared Redis cassette store. The proxy keeps its real separate-process HTTP topology; only the image model's api_base is pointed at the recorder in CI via IMAGE_GEN_RECORDER_BASE_URL, which is unset elsewhere so it falls back to api.openai.com. Recordings lapse 24h after write and are never refreshed on read, matching the VCR persister contract, so provider drift is still caught. Replayed responses drop upstream framing/server headers (content-length, transfer-encoding, content-encoding, date, server) so the re-serving layer recomputes them, honoring the Bedrock content-length lesson. * test(ci): close recorder http client on app shutdown Add a Starlette lifespan that closes the self-created httpx.AsyncClient on teardown, and leave caller-injected clients untouched so reuse across create_app calls is not broken. Covers the unclosed-client ResourceWarning raised in review. |
||
|
|
770fff7058
|
test(proxy): stop running real-DB tests in GitHub Actions unit jobs (#29700)
* test(proxy): stop running real-DB tests in GitHub Actions unit jobs GitHub Actions unit jobs were spinning up a Postgres service container, but the only active tests that touched it either used the DB incidentally (a cargo-culted prisma_client.connect()) or were genuine integration tests mislabeled as unit. Mock the incidental ones so the proxy-db job needs no container, and move the tests that genuinely need a database (proxy management behavior, master-key-not-persisted, schema-migration sync) to CircleCI, which is already the real-infrastructure lane. * test(proxy): restore no-unexpected-startup-writes canary in master-key test Greptile noted the hash-match assertion no longer catches other unexpected startup writes (a default key, a rotation artifact). The CircleCI job gives each run a fresh DB, so a clean startup must leave the table empty; add that canary back alongside the precise master-key assertion. |
||
|
|
84969aaf15
|
fix(ci): keep coverage rename green when a parallel node runs no tests (#29608)
* fix(ci): keep coverage rename green when a parallel node runs no tests
local_testing_part1 and local_testing_part2 run with parallelism 4. When
CircleCI reruns only the failed tests, the failed test lands on a single
node and the other nodes receive an empty bucket, so pytest never writes
coverage.xml or .coverage. The unguarded "mv coverage.xml ..." then exits
1 and turns the whole job red even though the rerun passed; the next
persist_to_workspace step would fail the same way on the missing paths.
Guard the rename so a node with no coverage emits empty placeholders
instead. coverage combine tolerates the empty files, so the downstream
upload-coverage job keeps the real nodes' data intact.
* fix(ci): pre-create test-results in litellm_router_testing for empty-bucket reruns
litellm_router_testing also runs with parallelism 4. On a rerun of only the
failed tests, a node can receive no tests, so the test command never creates
test-results and the final store_test_results step can fail on the missing
path. Pre-create the directory up front, matching what local_testing_part1
and part2 already do and CircleCI's own guidance for parallel reruns.
* test(openai): retry wildcard chat completion on transient OpenAI 500
build_and_test reddened on test_openai_wildcard_chat_completion when the
real gpt-3.5-turbo-0125 call returned an OpenAI 500 ("The server had an
error while processing your request"). The base branch passed the same
call concurrently, so the 500 is an intermittent OpenAI server error, not
a regression. Add the same pytest-retry marker the sibling real-call tests
in this file already use so a transient upstream 500 no longer fails CI.
|
||
|
|
34293fa80a
|
ci: reproduce default-Windows wheel install to guard MAX_PATH (#29597)
* ci: reproduce default-Windows wheel install to guard MAX_PATH The existing using_litellm_on_windows job installs the project with `uv sync`, an editable source install that never copies package files into a deep site-packages path, so it cannot see the 260-char MAX_PATH overflow that breaks `pip install litellm` on default Windows. The content-filter benchmark fixtures have hit that limit three times (#21941, #22039, #29536), each caught only after release. This adds a guard to the same job that builds the wheel and installs it the way an end user would: into a venv whose site-packages prefix is padded to a realistic worst-case Windows length (~100 chars), then asserts the install completes and litellm imports. Any packaged path long enough to bust MAX_PATH at that prefix is reported up front, so the check is deterministic regardless of the runner's long-path setting, while the real install also covers failure modes a length heuristic cannot (half-unpacked packages, reserved names, case collisions). This commit is the guard only; on the current tree it correctly fails because nine fixtures still exceed the limit. The rename that brings them back under it follows on this branch. * fix(packaging): shorten content-filter benchmark fixtures under MAX_PATH The 10 content-filter benchmark result fixtures used the legacy block_{topic}_-_contentfilter_({yaml}).json naming, up to 176 chars inside the wheel, which busts the Windows 260-char MAX_PATH limit once extracted under a realistic site-packages prefix and aborts `pip install litellm` on default Windows. Rename them to the short {topic}_cf.json scheme that _save_confusion_results already emits today (it splits the label on the em-dash and writes f"{topic}_cf"), matching the insults_cf.json and investment_cf.json files fixed earlier. Re-running the eval suite now regenerates these same short names rather than recreating the long ones. This drops the longest packaged path from 176 to 128, so the guard added in the previous commit goes from red to green with a 32-char margin. * test(windows): tidy MAX_PATH guard per review Close the wheel zip via a context manager rather than leaning on refcount collection, and select the wheel under dist/ by newest mtime so a stale artifact from an earlier build cannot be tested instead of the one just produced. Also pin down the venv-depth formula with a short note: the +2 is the separator joining the venv root to "Lib" plus the trailing separator before the entry, which lands the simulated site-packages prefix at exactly 100 chars. |
||
|
|
f48a87ef12
|
fix(ci): normalize whitespace before classname-to-path awk on test rerun (#29475) | ||
|
|
a9cc6ed68c
|
test(e2e): cover PROXY_LOGOUT_URL redirect on Logout (#29080)
* test(e2e): cover PROXY_LOGOUT_URL redirect on Logout Env-gated spec mirroring the existing serverRootPathRedirect pattern: when the proxy is booted with PROXY_LOGOUT_URL set, clicking Logout in the navbar must navigate to that external URL. The standard run_e2e.sh exports an empty value so the rest of the suite is unaffected; this spec self-skips unless the env var is populated. * test(e2e): run PROXY_LOGOUT_URL spec in the suite + harden logout assertions Boot the e2e proxy with PROXY_LOGOUT_URL set (job-level env in CircleCI and run_e2e.sh) so proxyLogoutUrl.spec.ts actually runs instead of self-skipping. Nothing else in the suite performs a logout, so this only affects the behavior under test. Harden the spec to verify the logout flow rather than a URL substring: - wait for /sso/get/ui_settings before clicking so logoutUrl is populated (otherwise window.location.href = "" silently reloads same-origin) - assert a token cookie exists first, and is cleared after logout - locate the dropdown via getByRole instead of internal antd CSS classes - stub the external destination and assert on URL origin + path prefix * test(e2e): assert exact PROXY_LOGOUT_URL on logout redirect Replace the origin + startsWith(pathname) checks with a single normalized href comparison. With PROXY_LOGOUT_URL=https://www.example.com the path was "/", so startsWith("/") matched any path and left path/query/hash unchecked. Comparing normalized hrefs pins scheme, host, port, path, query and hash while still tolerating the browser's trailing-slash/default-port normalization. |
||
|
|
f35e7eb2f6
|
feat(guardrails): add Microsoft Purview DLP guardrail (#24966)
* feat(guardrails): add Microsoft Purview DLP guardrail
* fix(guardrails/purview): raise_for_status on HTTP errors, cap scope cache, reuse executor
* fix(guardrails/purview): propagate litellm_call_id as correlation_id to Purview
* chore: fixes
* refactor(guardrails): delegate get_user_prompt to get_last_user_message
PurviewGuardrailBase duplicated AzureGuardrailBase (and OpenAIGuardrailBase)
user-prompt extraction. The same logic already lived in
common_utils.get_last_user_message; wire guardrail bases to that helper,
fix the helper docstring, and drop its redundant self-import of
convert_content_list_to_str.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(purview): make protection scope cache true LRU on hits
OrderedDict.get() does not update insertion order; call move_to_end on
TTL-valid cache hits so popitem(last=False) evicts least-recently-used
users instead of FIFO by first insert.
Add a regression test with a small max cache size.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* Fix mypy
* fix(guardrails/purview): harden user-id resolution and broaden DLP text
Prefer API key and proxy-injected metadata over client metadata for Entra
identity. Scan full message transcript pre-call and all completion choices
post-call. Align logging-only hook with the same user-id rules.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(guardrails/purview): scan /v1/completions prompt and TextChoices
Normalize text-completion prompts (string or list of strings); skip token-id-only
prompts. Run post-call DLP on TextCompletionResponse choices. Extend logging_only
hook for text_completion. Add tests and completion_prompt_to_str helper.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(purview-dlp): return data after DLP pass; per-call executor; dedupe text extraction
async_pre_call_hook now returns the request dict after a successful check so
callers match skip-path behavior. logging_hook uses a fresh ThreadPoolExecutor
per invocation like Presidio to avoid single-worker starvation. Response text
extraction is centralized in _completion_response_text_parts.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(purview): fix LRU cache refresh position and add Responses API scanning
Two fixes to the Microsoft Purview DLP guardrail:
1. LRU cache bug (base.py): When a stale scope cache entry was re-fetched,
the assignment updated the value but
Python's OrderedDict.__setitem__ preserves the original insertion order for
existing keys. This left the refreshed entry near the front of the dict,
making it the first candidate for LRU eviction via popitem(last=False).
Fix: call move_to_end(user_id) after every write to an existing key.
2. Responses API coverage gap (purview_dlp.py): Requests to /v1/responses use
an 'input' field instead of 'messages' or 'prompt', so the pre-call hook
returned without scanning the content. Similarly, post-call hook did not
handle ResponsesAPIResponse.output. Fix: add _responses_api_input_to_str()
helper and handle 'responses'/'aresponses' call types in async_pre_call_hook,
async_post_call_success_hook (via _completion_response_text_parts), and
async_logging_hook.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(purview): message separator, non-blocking logging_hook, TextChoices type error
Three bugs fixed in the Microsoft Purview DLP guardrail:
1. get_prompt_text_for_dlp message separator (base.py)
- Previously called get_str_from_messages() which concatenated all message
texts with NO separator, so 'end of msg1' + 'start of msg2' became
'end of msg1start of msg2'.
- Now joins per-message text with '\n\n' via convert_content_list_to_str(),
preserving DLP pattern detection accuracy across message boundaries.
2. logging_hook blocking the event loop thread (purview_dlp.py)
- Previously called future.result() which blocked the calling thread
(often the event loop thread) for the entire round-trip of two sequential
Microsoft Graph API calls (_compute_protection_scopes + _process_content).
- Now fires and forgets: when called inside a running loop, schedules the
coroutine with loop.create_task(); otherwise spawns a daemon thread.
Returns (kwargs, result) immediately in both cases.
- Removes unused concurrent.futures.ThreadPoolExecutor import; adds threading.
3. Incompatible assignment type error (purview_dlp.py:180)
- mypy inferred 'choice' as TextChoices from the first loop body, then
flagged the assignment in the second loop as incompatible with Choices.
- Fixed by using distinct loop variable names: text_choice (TextChoices) and
chat_choice (Choices).
Tests: 7 new tests added covering the separator fix (TestGetPromptTextForDlp)
and the non-blocking logging_hook (TestLoggingHookNonBlocking).
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(purview): suppress API errors in logging-only mode and scan tool-call arguments
Three issues fixed:
1. _check_content except block re-raised unconditionally even when
block_on_violation=False. The docstring promised 'log only - do not
raise' but network/API errors always propagated. Fixed by checking
block_on_violation before re-raising; when False, log a warning and
continue.
2. async_logging_hook used a single try/except wrapping both the prompt
and response audit calls. When the first _check_content (uploadText)
raised due to an API error the second call (downloadText) was silently
skipped. Fixed by giving each audit call its own try/except so both
always run independently.
3. convert_content_list_to_str() only reads message.content, so
tool_calls[].function.arguments and function_call.arguments were
invisible to the Purview pre-call and post-call scans. An authenticated
caller could embed sensitive text in tool-call arguments and bypass DLP.
Fixed by:
- Adding PurviewGuardrailBase._extract_tool_call_args_from_message()
which handles both dict and object-style messages, covering both
tool_calls[] arrays and the legacy function_call field.
- Updating get_prompt_text_for_dlp() to include those arguments
alongside message content (request/prompt path).
- Changing _completion_response_text_parts() from @staticmethod to an
instance method and adding tool-call argument extraction for
ModelResponse choices (response path).
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* chore(ui): restructure pre-built Next.js output to directory-based routing
Flat page files (e.g. guardrails.html) replaced by directory-based
index.html equivalents (e.g. guardrails/index.html) matching the
Next.js App Router output format.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* fix(purview): comprehensive security hardening — identity spoofing, streaming bypass, token-id gap
Four security issues addressed:
1. end_user_id kwargs fallback missing in _resolve_user_id_from_logging_kwargs
user_id already fell back to kwargs.get("user_api_key_user_id") when absent
from metadata, but end_user_id only checked md.get("user_api_key_end_user_id")
with no kwargs-level fallback. Added or kwargs.get("user_api_key_end_user_id").
2. Streaming responses bypassed post_call blocking
async_post_call_success_hook only runs on assembled non-streaming responses.
For streaming requests the proxy already delivered all content before the
hook ran, so raising HTTPException there had no effect. Added
async_post_call_streaming_iterator_hook which buffers the entire stream,
assembles it via stream_chunk_builder, runs the Purview DLP check, and only
then re-yields chunks via MockResponseIterator. If a violation is detected the
exception is raised before any bytes reach the client. The proxy automatically
skips async_post_call_success_hook for guardrails that define this method,
preventing duplicate scans.
3. Caller-controlled Purview user identity in blocking modes
When a LiteLLM API key has no bound user_id the guardrail fell back to
metadata[user_id_field], which is supplied by the caller. A caller could set
this to any Entra object ID whose Purview policies are more permissive and
bypass DLP. Added _resolve_trusted_user_id() that only returns identities
from the proxy auth system (user_api_key_dict.user_id, end_user_id, or
proxy-injected metadata["user_api_key_user_id"]). Added
_resolve_user_id_for_blocking() used by all blocking-mode hooks: tries
trusted sources first; if only caller-supplied is available, logs a
SECURITY WARNING and still proceeds (backward compat); if nothing resolves,
skips with a warning.
4. Token-id prompt DLP bypass
When /v1/completions received a pure token-id array prompt,
completion_prompt_to_str() returned None and the pre_call hook silently
skipped the Purview scan. An authenticated caller could tokenize blocked
text and send it without DLP evaluation. The hook now detects this case
(raw_prompt present but prompt_text None) and logs a WARNING while letting
the request pass through — token-id payloads are opaque at the text layer
and cannot be scanned. This makes the gap explicit rather than silent.
Tests: 94 total, all passing.
Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>
* Revert "chore(ui): restructure pre-built Next.js output to directory-based routing"
This reverts commit c70c4303b735bb3885732bd4a0e01997e9571f56.
* fix(purview): fail closed on identity spoofing, token prompts, and path encoding
Encode Entra user IDs in Graph paths, guard caches with asyncio.Lock, scan
Responses API instructions with string input, reject caller-only metadata and
token-id completion prompts in blocking mode, and revert unrelated UI HTML
restructure from the PR branch.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(purview): use threading.Lock and getattr for LitellmParams
- Replace asyncio.Lock with threading.Lock in PurviewGuardrailBase.
The cache lock is acquired both from the proxy's main event loop and
from short-lived event loops created by the logging_hook thread
fallback. In Python 3.10+ an asyncio.Lock is bound to the first event
loop that acquires it, so the second loop would silently break audit
logging with RuntimeError. All critical sections are in-memory dict
ops with no awaits, so a synchronous lock is safe.
- Use getattr() on LitellmParams in initialize_guardrail() instead of
.get(), which does not exist on Pydantic BaseModel instances and
would raise AttributeError at runtime. Tests updated to construct
Mock objects with spec= so they reflect the real interface.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* refactor(purview): dedupe trust-level user resolution and drop dead code
- _resolve_user_id now delegates levels 1-3 to _resolve_trusted_user_id
so blocking and non-blocking paths share a single source of truth.
- Drop redundant event_hook override in MicrosoftPurviewDLPGuardrail.__init__
(initialize_guardrail already forwards event_hook=litellm_params.mode).
- Drop unused self._logging_only attribute; blocking is controlled by the
block_on_violation argument passed to _check_content.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(purview): fail-closed on responses API transform error; avoid duplicate audit calls
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(purview): fail-closed blocking DLP; revert directory-based UI HTML
Blocking hooks now require UserAPIKeyAuth user_id/end_user_id only (no
spoofable metadata), re-raise Responses API transform errors, scan streamed
text completions, and reject requests with no bound identity. Reverts the
accidental directory-based Next.js output from cc47081 (c70c4303b7).
Co-authored-by: Cursor <cursoragent@cursor.com>
* Remove dead code in purview_dlp: _resolve_user_id_for_blocking never returns falsy
The method either returns a non-empty trusted user id or raises HTTPException,
so the 'if not user_id' guards in async_pre_call_hook and async_post_call_success_hook
were unreachable. Tighten the return type to str and drop the dead checks to
make the fail-closed behavior explicit.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(purview): exclude caller-controlled end_user_id from blocking DLP
Blocking Purview checks now use only API-key/JWT-bound user_id, not
end_user_id populated from request user/metadata/safety_identifier.
Co-authored-by: Cursor <cursoragent@cursor.com>
* style(purview): apply Black formatting to base.py
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(purview): use post-await timestamp for cache TTL
Capture the timestamp after the network call completes when storing it
as the cache freshness marker, so the effective TTL reflects when the
response was actually received rather than when the request started.
Under high network latency the previous behavior shortened the
effective cache lifetime.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(purview_dlp): fail closed when stream_chunk_builder returns None
stream_chunk_builder can return None (e.g., when ChunkProcessor filters
all chunks), causing both isinstance checks to fail and the buffered
chunks to be released without DLP scanning. Explicitly fail closed in
that case by raising an HTTPException so the streaming DLP guardrail
does not bypass policy enforcement.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(purview_dlp): resolve user_id before buffering stream
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* merge main (#28629)
* test(vcr): classify cache verdicts, detect live calls, surface cost leaks
Convert the per-test VCR verdict line from a single 'NOOP / HIT / MISS /
PARTIAL' tag into a classified outcome that distinguishes the cases that
silently bill the live API on every CI run from the ones that don't:
HIT pure replay
PARTIAL mixed replay + new recordings
MISS:RECORDED new cassette saved to Redis (cached next run)
MISS:OVERFLOW cassette > MAX_EPISODES_PER_CASSETTE; persister
refused to save; re-bills every run
MISS:NOT_PERSISTED test failed; save_cassette skipped; re-bills
NOOP VCR-marked but no HTTP traffic (mocked elsewhere)
UNMARKED:LIVE_CALL test bypassed VCR AND opened a TCP connection
to a known LLM provider host -> wasted spend
UNMARKED:NO_TRAFFIC test bypassed VCR but didn't call out
The UNMARKED:LIVE_CALL signal is what converts 'this test probably hits
live' into 'this test connected to api.openai.com'. We install a
socket.connect / socket.create_connection wrapper for the duration of
each non-VCR-marked test and record any outbound TCP to a known LLM
provider hostname. The probe sits below the httpx layer so vcrpy and
respx (which both patch above the socket) are unaffected.
Replace the file-level _RESPX_CONFLICTING_FILES blacklists in the
llm_translation and local_testing conftests with per-item respx
detection in apply_vcr_auto_marker_to_items. A test now skips VCR when
it actually carries @pytest.mark.respx or has respx_mock in its fixture
chain - not just because some other test in the same file imports
MockRouter. Items skipped by skip_files are split into respx_conflict
(real conflict, the module wires up respx) vs file_opt_out (dead skip-
list entry whose module never touches respx) so the session summary
makes pruning obvious.
Stabilize the AWS SigV4 fingerprint: the Authorization header on
Bedrock requests rotates its Credential date and Signature on every
call, which previously pushed every Bedrock test past the 50-episode
overflow threshold. Extract the access-key id only
('aws-sigv4:AKIA...') so two requests with the same identity match.
Always emit verdict logging when VCR is active (set
LITELLM_VCR_VERBOSE=0 to opt back into the legacy quiet mode). Add a
session-end classification summary that lists overflow tests, unmarked
live-call tests, and the skip-reason breakdown.
Wire the live-call probe + summary hook into every test directory that
already uses the Redis-backed VCR cache (audio_tests, guardrails_tests,
image_gen_tests, litellm_utils_tests, llm_responses_api_testing,
llm_translation, local_testing, logging_callback_tests, ocr_tests,
pass_through_unit_tests, router_unit_tests, search_tests,
unified_google_tests).
Add tests/llm_translation/test_vcr_classification.py covering the
verdict classifier, skip-reason tagging, AWS SigV4 fingerprint stability,
live-host classification, and session summary rendering.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
* test(vcr): drop dead 'from respx import MockRouter' imports
These seven test files were on _RESPX_CONFLICTING_FILES, which made the
auto-marker skip them entirely. Inspecting the source shows the only
respx artifact is a top-level 'from respx import MockRouter' that no
test ever uses - no @pytest.mark.respx, no respx_mock fixture, no
respx.mock context manager. The import is dead code left over from a
previous mocking pattern.
Now that apply_vcr_auto_marker_to_items detects respx per-item via the
marker / fixture chain (
|
||
|
|
07bcd2c19e
|
test(e2e): forward LITELLM_LICENSE to UI e2e proxy (#28398)
* test(e2e): forward LITELLM_LICENSE to UI e2e proxy The UI e2e job ran without LITELLM_LICENSE, so premium_user was always false in the issued login JWT and premium-gated UI surfaces (Team-BYOK Model switch, etc.) couldn't be driven through the UI. Forward the env var from run_e2e.sh and the CircleCI e2e_ui_testing job, and add a sanity test that decodes the admin storage state token and asserts premium_user=true so the wiring fails loudly if it ever regresses. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update ui/litellm-dashboard/e2e_tests/tests/proxy-admin/license.spec.ts Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> |
||
|
|
8acf64e16c
|
fix(interactions): never drop streamed text deltas; always emit terminal completion (#28394)
* fix(interactions): never drop streamed text deltas; always emit terminal completion The interactions streaming bridge had two bugs flagged by Greptile on PR #28153: 1. The first OutputTextDeltaEvent (and the second, when no ResponseCreatedEvent precedes the deltas) was consumed to emit a synthetic interaction.created / step.start event, but the chunk's text payload was never forwarded as a step.delta. The text only reappeared in the terminal step.stop, which defeats the purpose of incremental streaming. 2. When the upstream Responses API stream ended via StopIteration without a ResponseCompletedEvent, the iterator emitted step.stop but never the terminal interaction.completed event carrying the full collected text. This refactors the iterator to translate each upstream chunk into a list of events (instead of a single event) and buffers them in a deque. A text delta now expands into [interaction.created, step.start, step.delta] on the first chunk so no token is dropped, and the StopIteration / StopAsyncIteration fallback always flushes a terminal interaction.completed event when one hasn't already been sent. Both behaviors are covered by new unit tests: - test_no_text_token_is_dropped_during_streaming - test_response_created_then_text_delta_emits_step_start_and_delta - test_stop_iteration_fallback_emits_completion_event - test_response_completed_emits_stop_then_completion (no double-emit) Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(interactions): correlate EOF terminal events with stream's interaction id The StopIteration fallback path previously built the terminal step.stop / interaction.completed events with id=None (legacy content.stop) and a memory-address fallback string (interaction.completed), neither of which matched the item_id used by the earlier interaction.created / step.start / step.delta events in the same stream. Downstream consumers correlating events by id would see a mismatch. Persist the interaction id derived from the first upstream chunk (item_id on an OutputTextDeltaEvent, or response.id on a ResponseCreatedEvent) and reuse it when flushing the terminal events on EOF. Author: mateo-berri <277851410+mateo-berri@users.noreply.github.com> * ci(windows): raise UV_HTTP_TIMEOUT to 300s for uv sync The using_litellm_on_windows job has been hitting flaky PyPI download timeouts during 'uv sync --frozen --group dev' — different packages on each rerun (six, pydantic-core), all surfacing the same uv error: Failed to download distribution due to network timeout. Try increasing UV_HTTP_TIMEOUT (current value: 30s). uv's default 30s per-request timeout is too tight for the Windows runner on this project (50+ deps, several multi-MB wheels), so bump it to 300s to let slow individual downloads complete instead of failing the build. * fix(interactions): correlate ResponseCompletedEvent terminal events with stream's interaction id When a stream starts directly with OutputTextDeltaEvent (no preceding ResponseCreatedEvent), interaction.created carries item_id while interaction.completed previously carried response.id from ResponseCompletedEvent. The two ids can differ, leaving consumers that correlate events by id unable to match the start and completion events. Fall back to self._interaction_id (set on the first chunk that derives an id) before response.id, mirroring the EOF terminal path. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> |
||
|
|
62dca9e977
|
fix(ci): flag codecov uploads, enable carryforward, close coverage gaps (#28028)
* fix(ci): flag codecov uploads and enable carryforward Coverage uploads from GHA and CircleCI were unflagged. Commits that receive the push-triggered workflows more than once (re-runs, or branches cut at the same SHA) accumulated many overlapping flagless sessions, and Codecov's per-commit merge dropped the largest, ubiquitously-imported files (router.py, proxy_server.py, main.py, utils.py, cost_calculator.py) from the report even though the uploaded XMLs contained them. - codecov.yaml: flag_management.default_rules.carryforward: true - GHA reusable bases: tag each upload with its workflow/shard name - CircleCI: tag the combined upload "circleci"; also combine the agent / google_generate_content_endpoint / litellm_utils datafiles that were produced and required but missing from the combine list * fix(ci): close coverage gaps in proxy-legacy, router-unit, auth-ui, caching-redis - test-unit-proxy-legacy: route through _test-unit-base so the full proxy_unit_tests suite (incl. comprehensive test_proxy_server*.py) is measured and uploaded with per-group flags (was plain pytest, no --cov) - _test-unit-services-base: declare the enable-redis input + the six secrets test-unit-caching-redis passes; that workflow had a workflow_call signature mismatch and startup_failed on every push (never ran). Changes are additive/optional - proxy-db and security callers unchanged - circleci: add --cov + persist + combine + upload-coverage requires for litellm_router_unit_testing (tests/router_unit_tests) and auth_ui_unit_tests (tests/proxy_admin_ui_tests); neither was covered anywhere. Redundant -k subset jobs left as-is (local_testing covers them) * fix(ci): remove dead GHA Redis workflow; keep Redis on CircleCI only CircleCI redis_caching_unit_tests already runs the exact same files (tests/local_testing/test_dual_cache.py, test_redis_batch_optimizations.py, test_router_utils.py) with --cov, and that datafile is already combined and uploaded. The GHA test-unit-caching-redis workflow was redundant and had never run (workflow_call signature mismatch -> startup_failure on every push). - Delete .github/workflows/test-unit-caching-redis.yml - Revert _test-unit-services-base.yml to the flag-fix state (drop the enable-redis input / secrets / env wiring added only to prop up the GHA Redis workflow); the verified per-upload flags line is kept - The only single-star "litellm_*" branch glob lived in the deleted file; no other single-star globs exist, so none remain to widen * fix(ci): keep proxy-legacy as a standalone job to preserve required check names Routing proxy-legacy through the reusable workflow renamed each check from the bare matrix name (e.g. "proxy-response-and-misc") to "proxy-response-and-misc / Run tests". Those bare names are required status checks in branch protection, so the old contexts never reported and PRs sat "Expected — Waiting for status to be reported" indefinitely. Restore the original standalone matrix job (job name == matrix name, so the required contexts report again) and add coverage in place: --cov on pytest plus an OIDC Codecov upload flagged proxy-legacy-<group>. Net effect of the gap-#2 fix is preserved (flagged coverage for tests/proxy_unit_tests/**) without changing any check name. * revert(ci): drop all proxy-legacy changes from this PR tests/proxy_unit_tests/** is already fully covered by test-unit-proxy-db (its shard-coverage guard fails CI if any file in that dir is unassigned), which this PR already flags + carryforwards. Adding --cov and id-token:write to the legacy pull_request job was redundant and put OIDC on a job that runs untrusted PR code. Restore the file to the base version verbatim so this PR no longer touches proxy-legacy at all (also restores its original required check names). Retiring proxy-legacy in favor of proxy-db on pull_request is a separate effort that needs a branch-protection change. |
||
|
|
538092a55f
|
ci: use --cov=./litellm so coverage paths resolve unambiguously in Codecov
pytest-cov treats --cov=<module-name> as a Python package and emits XML paths relative to the package root, stripping the litellm/ prefix (`proxy/proxy_server.py` instead of `litellm/proxy/proxy_server.py`). Codecov's auto-prefix heuristic then drops every file whose basename is ambiguous in the repo — `proxy_server.py` (3 copies under enterprise/), `router.py` (2 copies), `utils.py` (20+), `main.py` (20+), `constants.py` (2). The 11 highest-fix-rate hotspots have never appeared in Codecov. Switching to --cov=./litellm treats the argument as a path, which makes coverage.xml emit repo-relative paths (`litellm/proxy/proxy_server.py`). Each path is unambiguous, so Codecov resolves all files correctly. Verified locally: rerunning a single proxy_unit_tests test with --cov=./litellm produced `filename="litellm/proxy/proxy_server.py"`, `filename="litellm/router.py"`, and `filename="litellm/types/router.py"` as distinct entries — exactly the disambiguation Codecov needs. Touches every workflow that uploads coverage: the two reusable GHA workflows (_test-unit-base.yml, _test-unit-services-base.yml), test-mcp.yml, and all 14 invocations in .circleci/config.yml. |
||
|
|
fdaa288607
|
ci(circleci): enable Rerun Failed Tests for all pytest jobs (#27155)
* ci(circleci): enable Rerun Failed Tests for all pytest suites Migrated every pytest-based CircleCI job that uploads JUnit results to use 'circleci tests run' instead of invoking pytest directly. This is the prerequisite for CircleCI's 'Rerun failed tests' feature to be available on each job in the pipeline. For each job: - Glob test files via 'circleci tests glob' and pipe them into 'circleci tests run --command="xargs ... pytest ..."' so the agent can feed the failed-test subset on rerun. - Preserve all original pytest flags (parallelism, timeouts, retries, coverage, junit output paths). - For jobs that previously lacked 'store_test_results' (proxy spend accuracy, proxy_build_from_pip, db_migration_disable_update_check), add the step so JUnit XML is uploaded and rerun is actually wired up. - Replace the dynamic IGNORE_DIRS shell array in llm_translation_testing with a 'grep -v' filter on the glob output, matching the previous behavior of skipping tests/llm_translation/realtime. - For 'build_and_test', glob 'tests/test_*.py' (top-level only) which matches the prior 'tests/*.py' shell glob; the long list of '--ignore=tests/<subdir>' flags was vestigial and is dropped. Jobs already using 'circleci tests run' (local_testing_part1/2, litellm_router_testing) are unchanged. * fix(ci): convert classnames to file paths on rerun CircleCI's Rerun Failed Tests sends each previously failed test as a JUnit classname (e.g. 'tests.otel_tests.test_key_logging_callbacks'), but pytest needs a file path. Without the awk preprocess step, rerun runs fail with 'file or directory not found'. Mirror the awk transform that local_testing_part1, local_testing_part2, and litellm_router_testing already use, so rerun works in every job that this PR migrated to 'circleci tests run'. * ci: drop -x from OTEL pytest run so all failures are reported --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> |
||
|
|
19ad964c4a
|
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/vigorous-albattani-2b7480 | ||
|
|
c1c0506d2c
|
[Perf] CI: Skip Redundant Playwright Apt Install in E2E UI Job
The cimg/python:3.12-browsers base image already ships every Chromium system dependency Playwright needs (libnss3, libatk-bridge2.0-0, libcups2, etc. — the install log shows them all as "already the newest version"). Passing --with-deps to `npx playwright install` therefore runs an apt-get update + install for nothing, but pays the full cost of hitting Ubuntu mirrors. On a recent run those mirrors stalled hard: apt-get update alone took 6m53s at 81.5 kB/s with several archives returning connection refused. Drop --with-deps and persist ~/.cache/ms-playwright alongside node_modules so the Chromium binary is also reused across runs. Bump the cache key to v2 so the existing v1 entry (which only contained node_modules) is not loaded and skipped over the new browser path. |
||
|
|
0976fbc6c4
|
[Fix] Tests: Restore /metrics access for prometheus test suite
/metrics now requires auth by default; tests/otel_tests/test_prometheus.py makes 4+ unauthenticated GETs against http://0.0.0.0:4000/metrics, so every prometheus test in CI now fails the metric assertion. Set require_auth_for_metrics_endpoint: false in otel_test_config.yaml to opt out for this test job, which scrapes /metrics directly. Verified locally: 8/8 prometheus tests green (one flaky retry on test_proxy_success_metrics that pre-dates this PR). Also drop the -x stop-on-first-failure flag from the otel test command so all failures in the job surface in a single CI run rather than hiding behind whichever one trips first. |
||
|
|
727ab8dcc4
|
[Fix] Proxy: Break managed-resources import cycle on Python 3.13
The Python 3.13 CCI smoke matrix surfaces a partially-initialized-module
ImportError when loading the managed files hook chain:
litellm.proxy.hooks/__init__ (mid-import)
-> enterprise.enterprise_hooks
-> litellm_enterprise.proxy.hooks.managed_files
-> litellm.llms.base_llm.managed_resources.isolation
-> litellm.proxy.management_endpoints.common_utils
-> litellm.proxy.utils (re-enters litellm.proxy.hooks)
The except ImportError block in hooks/__init__.py silently swallowed the
failure, leaving managed_files unregistered and POST /files returning
500 "Managed files hook not found".
Two-layer fix:
- Inline the 3-line _user_has_admin_view check in isolation.py instead
of importing it from litellm.proxy.management_endpoints.common_utils.
litellm.llms.* should not depend on litellm.proxy.* — removing this
layering violation breaks the cycle at its root.
- Define PROXY_HOOKS and get_proxy_hook before the conditional
enterprise import in litellm/proxy/hooks/__init__.py, so any future
re-entry resolves the public names instead of hitting an
ImportError on a partially-initialized module.
Also fold in two unrelated CCI repairs surfaced in the same staging run:
- tests/otel_tests/test_key_logging_callbacks.py: per-key
gcs_bucket_name / gcs_path_service_account are now stripped by
initialize_dynamic_callback_params, so the GCS client falls through
to the env-only branch. Update the assertion to match the new
"GCS_BUCKET_NAME is not set" message.
- .circleci/config.yml: tests/pass_through_tests now resolves
google-auth-library@10.x via the @google-cloud/vertexai 1.12.0 bump,
which uses dynamic ESM imports Jest 29 cannot load without
--experimental-vm-modules. Pass that flag in the Vertex JS test step.
Adds tests/test_litellm/proxy/hooks/test_proxy_hooks_init.py as a
regression guard: managed_files / managed_vector_stores must register,
and isolation.py must not transitively import litellm.proxy.utils.
|
||
|
|
82dacfb746
|
Merge pull request #26461 from BerriAI/litellm_fix_circleci_rerun
fix(ci): support CircleCI rerun failed tests for local_testing jobs |
||
|
|
68d4420233 |
fix(ci): strip trailing class segment from JUnit classnames before pytest
Pytest tests inside a class produce JUnit XML classnames like 'tests.local_testing.test_file_types.TestFileConsts' (module + class). The previous awk preprocessor would convert this to 'tests/local_testing/test_file_types/TestFileConsts.py', which doesn't exist, causing pytest to collect 0 items on rerun. Strip a trailing '.<UppercaseSegment>' before the dot-to-slash conversion. Module path segments are lowercase (test files start with 'test_'), and the class name is the only segment beginning with an uppercase letter, so this is unambiguous. Verified affected files in tests/local_testing/: test_file_types.py (TestFileConsts), test_gcs_cache_unit_tests.py, test_disk_cache_unit_tests.py, test_docker_no_network_on_deploy.py, test_sagemaker_nova_integration.py, test_cache_preset_key.py. |
||
|
|
ed0a965208 |
fix(ci): convert dot-notation test paths to file paths for CircleCI rerun
CircleCI's 'Rerun failed tests' feature passes test identifiers from the JUnit XML classname attribute (dot notation, e.g. 'tests.local_testing.test_router') via stdin. pytest receives these paths and collects 0 items, causing the rerun to exit 123 with no tests run. Add an awk preprocessor before xargs that detects dot-notation module paths and converts them to file paths (tests/local_testing/test_router.py). File paths already containing '.py' are passed through unchanged. Applied to all three jobs using the 'circleci tests run' + 'xargs pytest' pattern: local_testing_part1, local_testing_part2, and the router test job. |
||
|
|
8e652d129d
|
Merge pull request #26356 from BerriAI/litellm_cci_gha_dedup_and_shard
[Infra] Remove CCI/GHA test duplication and semantically shard proxy DB tests |
||
|
|
7c69262279
|
Merge pull request #26349 from BerriAI/litellm_deflakeSpendTests
[Fix] Deflake spend tracking tests |
||
|
|
c2f40e89d5
|
[Infra] Remove CCI/GHA test duplication and semantically shard proxy DB tests
Split into two related cleanups:
1. Delete CCI jobs that duplicate GHA coverage:
- mcp_testing (tests/mcp_tests) — already run by test-mcp.yml
- litellm_mapped_tests_proxy_part1/part2 (tests/test_litellm/proxy) —
already run across test-unit-proxy-auth.yml, test-unit-proxy-endpoints.yml,
and test-unit-proxy-infra.yml
Add rag_endpoints and realtime_endpoints to test-unit-proxy-endpoints.yml
(they were only covered by the deleted CCI part2 job).
Remove the corresponding workflow wiring, coverage combine entries, and
upload-coverage dependencies in .circleci/config.yml.
2. Re-shard test-unit-proxy-db.yml from 4 alphabetic buckets to 8 semantic
ones (auth-and-jwt, proxy-server, logging-and-callbacks, db-and-spend,
guardrails-budget-hooks, endpoints-and-responses, plus the existing
serial key-generation and test_proxy_utils.py shards). New test files are
placed in whichever group they belong to instead of reshuffling slices.
Add a dist input to _test-unit-services-base.yml so the test_proxy_utils.py
shard can use --dist=worksteal to spread its ~64 (many parametrized)
functions across workers; the default --dist=loadscope pins a single file
to a single worker, which was the root cause of that shard running 10m+.
|
||
|
|
4af2b67357
|
[Fix] Drop orphan teardown step from Greptile merge
Previous commit from greptile-apps added a new `when: always` teardown step without removing the prior `name:`-only step, leaving a `- run` block with no `command:` — CircleCI config validation rejects that. Collapse back to a single teardown step that runs on success and failure. |
||
|
|
8adb3a6a8f
|
Apply suggestion from @greptile-apps[bot]
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> |
||
|
|
e37d1b0cb6
|
[Fix] Deflake spend tracking tests
Two independent deflakes: 1. test_ui_view_spend_logs_unauthorized (unit) was returning 400 instead of 401/403 when earlier tests in the file left proxy-auth globals (prisma_client, master_key, user_custom_auth, general_settings, user_api_key_cache) in a state that let invalid tokens pass auth and fall through to the endpoint's own start_date/end_date validation. Add an autouse fixture that pins those globals to their import-time defaults for every test in the file. Harden the assertion to include response body so future flakes are diagnosable. 2. test_basic_spend_accuracy (CI job proxy_spend_accuracy_tests) depends on the Redis transaction buffer flushing spend to Postgres. The buffer uses a single global pod-lock key (cronjob_lock:db_spend_update_job) and a single global buffer list key. Pointing the proxy at the shared remote Redis means concurrent CI pipelines contend for the same lock and can drain each other's buffer into the wrong database. Add a start_redis reusable command that boots a per-job redis:7-alpine container (digest-pinned), and switch proxy_spend_accuracy_tests to REDIS_HOST=host.docker.internal:6379 so lock and buffer state are isolated per CI run. |
||
|
|
03a022436b
|
[Infra] CCI: run RVM install from its own checkout dir
The rvm/install script sources scripts/functions/installer using paths relative to the caller's working directory (not $0), so invoking /tmp/rvm/install from /home/circleci/project fails with 'No such file or directory'. Switch to (cd /tmp/rvm && ./install). |
||
|
|
eb6a2d043c
|
[Infra] CCI: pin Ruby and Node.js installs in proxy_pass_through_endpoint_tests
Align the Ruby, Node.js, and npm install path with the rest of the config. Three separate upstream installers were being invoked via \`curl ... | bash\` or unlocked \`npm install\`: - RVM's \`get.rvm.io/stable\` installer (mutable upstream script). Replace with a shallow git clone of the rvm/rvm repo at tag 1.29.12 and verify HEAD matches the published commit SHA before running the local \`./install\` script. Same pattern already used for the helm-unittest plugin in .github/workflows/helm_unit_test.yml. - NodeSource's \`deb.nodesource.com/setup_18.x\` piped into sudo bash. Replace with a direct download of the Node.js 18.20.8 linux-x64 tarball from nodejs.org, verified against the published SHASUMS256.txt digest before extraction. - \`npm install @google-cloud/vertexai @google/generative-ai\` and \`--save-dev jest\` resolved fresh from the npm registry on every run. Add \`tests/pass_through_tests/package.json\` with pinned direct-dep versions and commit the generated package-lock.json, then switch CI to \`npm ci\` (exact lockfile install, fails on drift). Also scopes the Ruby+JS test runners to \`tests/pass_through_tests/\` so they pick up the committed package.json rather than writing node_modules at repo root. |
||
|
|
a12a2190d7
|
[Infra] Flip remaining CI jobs to Python 3.12
Stragglers from the 2026-04-21 Python 3.12 standardization: - .github/workflows/check_duplicate_issues.yml (was 3.11) - .github/workflows/llm-translation-testing.yml (was 3.11) - .github/workflows/scan_duplicate_issues.yml (was 3.13) - .circleci proxy_build_from_pip_tests (was 3.13) The only intentional non-3.12 CI job is installing_litellm_on_python_3_13, which exists as an explicit "latest supported Python" smoke matrix. |
||
|
|
547d60c642
|
[Infra] CCI: match Windows uv install path to Linux verification pattern
The Windows uv install step was piping a remote install.ps1 into Invoke-Expression without any integrity check, while the Linux install steps (install_uv command, line 89) download to a file, verify SHA-256 against a hardcoded digest, and only then execute. Bring the Windows path to the same pattern. Also hardcode the kubectl v1.31.4 checksum in helm_chart_testing instead of fetching kubectl.sha256 from the same origin as the binary — if dl.k8s.io were ever to serve a tampered pair, a co-hosted checksum provides no additional integrity. |
||
|
|
44362cb167
|
[Infra] CCI: factor repeated filters and Python docker image to YAML anchors
The same branch filter block appeared 46 times in the workflow
declaration:
filters:
branches:
only:
- main
- /litellm_.*/
And the same pinned Python docker image appeared 29 times in jobs:
- image: cimg/python:3.12@sha256:9c796c...
auth:
username: ${DOCKERHUB_USERNAME}
password: ${DOCKERHUB_PASSWORD}
Replace with YAML anchors declared at first use:
- `&main_branches` on using_litellm_on_windows's filters block;
all other job entries reference it as `filters: *main_branches`.
- `&python312_image` on local_testing_part1's first docker image
entry; all other jobs reference `- *python312_image`, including
the multi-image jobs (auth_ui_unit_tests,
installing_litellm_on_python_v2_migration_resolver) which keep
their postgres sidecar entry inline afterwards.
Net result: one place to change when the image digest rolls or
the branch-filter convention changes. No behavior change — YAML
anchor resolution produces identical config at parse time.
Also adds Docker Hub auth block to upload-coverage (previously
pulled anonymously). No functional difference for a public
image, but avoids Docker Hub rate limits now that we reuse the
same entry.
|
||
|
|
bea872a034
|
[Infra] CCI: remove dead steps accumulated across jobs
Clean out copy-paste debug and workaround lines that serve no purpose: - `pwd && ls` echoes at the top of 30 "Run tests" steps (CCI already logs working_directory on every step). - "Show git commit hash" in local_testing_part1/part2 and langfuse_logging_unit_tests (CCI shows the SHA in every job header). - "Verify Docker is available" stubs in 6 machine-executor jobs (machine executors always have Docker). - `sudo systemctl restart docker` in proxy_store_model_in_db_tests (one-off workaround; not used anywhere else). - Duplicated Black formatting step in local_testing_part1 and local_testing_part2 — Black runs in the lint job, no reason to run it again here. - Second back-to-back `helm test litellm --logs` invocation in helm_chart_testing (one call is enough). No behavior change — these are all log-only or no-op steps. |
||
|
|
28e1d2f1a6
|
[Infra] CCI: unify uv cache key and cache only ~/.cache/uv
Consolidate 6 distinct cache-key prefixes (v2-dependencies-,
v1-router-testing-deps-, v1-router-unit-deps-, v1-llm-translation-deps-,
v1-llm-responses-deps-, v3-litellm-uv-deps-, ui-e2e-py-deps-v2-) onto a
single v1-uv-cache-<uv.lock checksum> key shared across all Python jobs.
Cache only ~/.cache/uv (the content-addressed uv download cache,
hash-verified against uv.lock at install time). Drop ./.venv,
~/.local/{bin,lib}, and /home/circleci/.{pyenv,local} from cache paths.
~/.cache/uv is the only path uv sync needs to avoid re-downloading from
PyPI; everything else is rebuilt each run from that verified cache.
Remove partial-prefix restore-keys fallbacks — cache either hits exactly
on the uv.lock hash or rebuilds cleanly.
First run after merge will cold-miss on the new key; subsequent runs
hit the unified cache.
|
||
|
|
b6fdd46636
|
Merge pull request #26270 from BerriAI/litellm_/lucid-kowalevski-de832f
[Fix] Stabilize flaky spend accuracy tests + patch Redis buffer data-loss path |
||
|
|
5445297da9
|
[Fix] Stabilize flaky spend accuracy tests with local ground truth
Replace the calibration step (one request + 10-minute poll) with an independent ground truth computed from response usage via litellm.cost_per_token. All N requests are made up front, so a single dropped Redis write no longer kills the test. Add /health/readiness checks at test start and on poll timeout so the failure message surfaces proxy state (db, cache) instead of "calibration timed out". Set PROXY_BATCH_WRITE_AT=2 in the spend tracking CI job to shorten the scheduler flush window. |
||
|
|
1b74c35b89
|
[Infra] Move non-API-key CCI jobs to GitHub Actions
Principle: GHA handles work that doesn't need external API keys; CCI
stays for integration tests that hit real API endpoints.
Four CCI jobs moved to new or extended GHA workflows:
1. check_code_and_doc_quality (was 25 runs: ruff + import-safety +
21 code_coverage_tests + 3 documentation_tests + circular-imports).
- The 21 tests/code_coverage_tests/*.py scripts and the 3
tests/documentation_tests/*.py scripts run in the new
.github/workflows/test-code-quality.yml workflow.
- ruff, import-safety, and circular-imports were already run by
.github/workflows/test-linting.yml — no new migration needed.
- The 3 documentation_tests scripts read
docs/my-website/docs/proxy/config_settings.md. Since docs have
moved to BerriAI/litellm-docs, the GHA workflow checks out that
repo and symlinks docs/my-website -> the checkout so the
existing hardcoded paths resolve without touching the scripts.
The stale local docs/my-website/ copy in this repo will be
removed in a separate PR.
2. semgrep (custom-rule SAST against .semgrep/rules).
- New .github/workflows/test-semgrep.yml.
3. installing_litellm_on_python + installing_litellm_on_python_3_13
(pip install compat checks on Python 3.12 and 3.13).
- New .github/workflows/test-install-litellm.yml as a matrix job.
- 3.12 run also verifies litellm_enterprise import; 3.13 run
skips that check (matches previous CCI behavior).
- installing_litellm_on_python_v2_migration_resolver stays in CCI
because it requires a postgres service.
CCI .circleci/config.yml: -112 lines, 4 jobs and their workflow refs
removed.
|
||
|
|
61fd4e985e
|
[Infra] CCI config cleanup — dead step, filter dupe, cache keys, machine image
Follow-up cleanup after an independent review pass surfaced a few
loose ends:
- Delete a 6x-duplicated filter block in litellm_mapped_tests_proxy_part2
(same kind of copy-paste residue we fixed earlier in
langfuse_logging_unit_tests).
- Delete the empty "Install Semgrep" run step in the semgrep job — the
command body was empty because semgrep is installed on-demand via
uv tool run in the next step.
- Standardize machine-executor image: one job was on ubuntu-2204:2023.10.1
while build_docker_database_image was already on ubuntu-2204:2024.04.1.
Bumped everything to 2024.04.1.
- Remove the legacy "version: 2" inside the workflows: block — CircleCI
2.1 top-level already declares the version.
- Drop `{{ checksum ".circleci/config.yml" }}` from cache keys (13 sites).
It was busting the cache on every unrelated config edit; the uv.lock
checksum alone is the right dependency cache key.
- Add partial-restore fallbacks to every restore_cache with a single
templated key (10 sites). Jobs now fall back to the latest cache with
a matching prefix if the exact uv.lock hash isn't cached yet.
Net: -14 lines.
|
||
|
|
0a65d2c535
|
[Infra] Standardize default Python to 3.12 and remove miniconda setup
Docker-executor jobs: - Consolidate base images on cimg/python:3.12. Jobs previously on 3.11 (26 jobs), 3.9 (1 historical: upload-coverage), and an incidental 3.13.1 (litellm_assistants_api_testing) now use 3.12. - installing_litellm_on_python_3_13 keeps cimg/python:3.13.1 as its explicit "latest Python supported" install-check matrix job. Machine-executor jobs: - Delete the miniconda install step from 10 jobs. uv now manages Python directly: uv sync --python 3.12 auto-downloads a python-build-standalone interpreter if the ubuntu-2204 base image's default python doesn't match. - Remove 37 "if [ -f conda.sh ]; then conda activate myenv" wrappers and 2 unconditional conda activate blocks left behind from the conda days. - proxy_build_from_pip_tests keeps its 3.13 target (it was conda create -n myenv python=3.13) via uv sync --python 3.13. Net: -301 lines. |
||
|
|
344be27e83
|
[Refactor] Add start_postgres reusable command and migrate call sites
Add a start_postgres command parameterized on db_name (default circle_test) that runs the postgres-db container and waits for port 5432 to accept connections. Replace all 11 inline docker run / wait_for_service blocks with a single - start_postgres call. The helm chart test overrides db_name to litellm_test; everything else uses the default. One of the 11 sites previously used a bespoke pg_isready loop instead of wait_for_service; it now goes through the same TCP-probe path everyone else uses, which is sufficient for test ordering purposes. Net: -112 lines. |
||
|
|
f490340a52
|
[Refactor] Add install_uv reusable command and migrate all call sites
Add a single install_uv command in the commands: section that encodes the uv version (0.10.9) and its SHA256 in one place, then replace all 42 inline curl|sha256|install blocks across every job that needs uv. setup_litellm_test_deps now calls install_uv too, so the shared test-dep bootstrap goes through the same path. Bumping uv version or SHA is now a one-line change instead of 43. Net: -203 lines. |
||
|
|
439bbd223b
|
[Infra] Clean up unused CCI jobs and pin docker images by digest
- Remove mypy_linting job (GHA test-linting.yml already runs this) - Remove three redundant "Install curl" apt-get steps (curl is already present on the ubuntu-2204 machine image and used successfully earlier in each affected job) - Dedupe langfuse_logging_unit_tests filter block (6x copy of the same two branch filters collapsed to 1) - Pin all docker image references by @sha256 digest so builds stay reproducible when upstream tags are updated: cimg/python:3.9, 3.11, 3.12, 3.12-browsers, 3.13.1, cimg/node:20.19, cimg/postgres:16.0, and postgres:14 used via docker run Net: -62 lines, 49 image references pinned. |
||
|
|
ee550e1949
|
[Test] CI: add v2 migration resolver coverage with local Postgres
Adds end-to-end CI coverage for `--use_v2_migration_resolver` via a new job `installing_litellm_on_python_v2_migration_resolver`: - Clones the pytest smoke path from `installing_litellm_on_python` but uses a local Postgres sidecar instead of the shared DB to prevent collisions with the v1 variant. - Runs only the new `test_litellm_proxy_server_config_no_general_settings_v2_resolver` which spawns the proxy with `--use_v2_migration_resolver` and smoke-tests `/health/liveliness` and `/chat/completions`. Refactors `test_basic_python_version.py`: - Extracts the proxy spawn + smoke-test body into `_run_proxy_server_smoke_test` so the v1 and v2 tests share the same code path. - The existing `test_litellm_proxy_server_config_no_general_settings` is now a thin wrapper that passes no extra args (v1 default, unchanged). - Adds `..._v2_resolver` variant that passes `--use_v2_migration_resolver`. The existing `installing_litellm_on_python` / `installing_litellm_on_python_3_13` jobs filter out the v2 variant via `-k "not v2_resolver"` so they keep running only against their shared DB, unchanged behavior. |
||
|
|
0f5d503169
|
fix(ci): make e2e_ui_testing actually test the freshly built UI bundle
The Build UI from source step used:
cp -r out/ ../../litellm/proxy/_experimental/out/
GNU cp (CircleCI's Ubuntu image, coreutils 8.32) interprets this as
copy the source directory as a CHILD of the destination when the
destination already exists — so the command silently created
litellm/proxy/_experimental/out/out/ instead of replacing the served
bundle at litellm/proxy/_experimental/out/*.
The proxy continued serving whatever bundle was checked in, so every
e2e_ui_testing run between this job's introduction (
|
||
|
|
bb62099323
|
[Fix] CI - auth_ui_unit_tests: use Postgres sidecar instead of shared DB
Run auth_ui_unit_tests against a per-job cimg/postgres:16.0 sidecar with DATABASE_URL pointing at localhost:5432, matching the pattern used by e2e_ui_testing. Seed the schema via 'litellm --skip_server_startup --use_prisma_db_push' so each run starts on a clean DB with the current schema.prisma. |
||
|
|
f24c8dbf79
|
chore: bump CircleCI conda envs from python 3.9 to 3.10
Six CI jobs create a miniconda env with python=3.9 before installing the project; these jobs now fail resolution because the project requires-python is >=3.10. Bump the conda env python to 3.10 to match the new floor. |
||
|
|
ebac729146
|
[Infra] CI: reduce llm_translation_testing parallelism and tolerate worker restarts
Workers in llm_translation_testing have been crashing mid-run with "Not properly terminated" (OOM), even after bumping resource_class to xlarge. Reduce xdist workers from 8 to 4 to lower peak memory, and add --max-worker-restart=5 so a crashed worker is replaced instead of failing the whole run. |
||
|
|
65717add14
|
Merge pull request #25887 from BerriAI/litellm_/vigilant-cannon
[Infra] Bump llm_translation_testing resource class to xlarge |
||
|
|
72ba880905
|
[Infra] Bump llm_translation_testing resource class to xlarge | ||
|
|
55f2a898be
|
[Infra] Remove unused publish_proxy_extras and prisma_schema_sync jobs
publish_proxy_extras is superseded by PyPI trusted publishing (OIDC); the CircleCI project no longer has PYPI_PUBLISH_* credentials configured. prisma_schema_sync is a leaf smoke test with no dependents, and db push against the current schema is already exercised by e2e_ui_testing. |
||
|
|
a01cf44c35
|
fix: remove non-existent litellm_mcps_tests_coverage from coverage combine |