Commit Graph

783 Commits

Author SHA1 Message Date
Yuneng Jiang
491aa7ea51
[Fix] CI: fix 6 more CircleCI job failures from uv migration
1. check_code_and_doc_quality: add PLR0915 ignore for sandbox_executor.py
2. auth_ui_unit_tests: add prisma generate step (entrypoint.sh only runs
   migration, not client generation)
3. proxy_store_model_in_db_tests: move does_mcp_server_exist outside
   `if MCP_AVAILABLE:` so it's importable on Python < 3.10
4. build_and_test: fix datetime.fromisoformat('...Z') on Python 3.9
   (Z suffix support was added in 3.11)
5. proxy_logging_guardrails: fix container name typo my-app-2 -> my-app-3
6. upload-coverage: use `uv tool run` instead of `uv run --with` to avoid
   resolving the full workspace (which fails for Python 3.14)
2026-04-10 21:06:25 -07:00
Yuneng Jiang
93d340c1ad
[Fix] CI: fix uv migration breaking 7 CircleCI jobs
setup_litellm_enterprise_pip was running `uv sync --package litellm-enterprise`
which overwrites the shared .venv, stripping out pytest, prisma, and other
dev/test deps. Since litellm-enterprise is a workspace member it is already
installed by the main `uv sync --all-groups --all-extras`. Replace with a
verification-only import check.

Also prefix bare `ruff check` with `uv run --no-sync` since uv does not
auto-activate the venv.
2026-04-10 17:09:33 -07:00
Yuneng Jiang
bb7ac7c4ca
[Fix] Finish uv migration for redis_caching, e2e_ui, and fix prisma/black in CI
- Replace `uv run --no-sync prisma generate` with `python -m prisma generate`
  in proxy_part1, proxy_part2, and enterprise jobs (fixes spawn error)
- Migrate redis_caching_unit_tests from requirements.txt to uv sync
- Migrate e2e_ui_testing from requirements.txt to uv sync, replace bare
  prisma/python calls with uv run equivalents
- Bump venv cache keys from v1 to v2 with config.yml checksum to bust
  stale caches missing black and other dev dependencies
2026-04-10 16:10:50 -07:00
Yuneng Jiang
4b6eb02b66
[Fix] Pin uv/pip versions and fix bare prisma calls in CI
- Pin `pip==26.0.1` and `uv==0.10.9` in CCI jobs that used unpinned
  `pip install uv` (redis_caching_unit_tests, ui_e2e_tests)
- Replace bare `prisma generate` with `uv run --no-sync prisma generate`
  in proxy_part1, proxy_part2, and enterprise test jobs
- Remove duplicate `check=True` kwarg in test_basic_python_version.py
  that caused TypeError with `_run_uv()` helper
2026-04-10 00:04:32 -07:00
stuxf
a6c30b30bf
build: migrate packaging, CI, and Docker from Poetry to uv (#25007)
* build: migrate packaging metadata to uv

* ci: move automation and local tooling to uv

* docker: migrate image builds and runtime setup to uv

* docs: update install and deployment guidance for uv

* chore: align auxiliary scripts and tests with uv

* test: harden test_litellm isolation

* fix: keep release and health check images self-contained

* build: pin uv tooling and health check deps

* test: isolate bedrock image request formatting from suite state

* test: cover sandbox executor requirements flow

* ci: fix circleci no-op command steps

* ci: fix circleci publish workflow parsing

* fix: stabilize remaining uv migration CI checks

* ci: increase matrix test timeout headroom

* fix: restore published docker and license coverage

* fix: restore proxy runtime build parity

* fix: restore proxy extras parity and venv migrations

* ci: persist uv path across circleci steps

* fix: keep psycopg binary in default test env

* docker: preserve prisma cache across stages

* test: run local proxy checks through uv python

* build: restore runtime deps moved into ci

* build: refresh uv lock after upstream merge

* fix: restore module import in test_check_migration after merge

The conflict resolution imported only the function but the test body
references check_migration as a module throughout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching

- Move google-generativeai, Pillow, tenacity back to ci group (they are
  lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
  in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
  from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
  environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
  deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate uv.lock after removing nodejs-wheel-binaries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): use cache/restore instead of cache to prevent cache poisoning

The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert

The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv cache in publish workflow

Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove duplicate verbose_logger mock in test_check_migration

The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): free disk space before Docker build in test-server-root-path

The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:46:23 -07:00
yuneng-jiang
072d4108c3
Merge pull request #25365 from BerriAI/litellm_e2e_ui_tests
[Feature] UI E2E Tests: Proxy Admin Team and Key Management
2026-04-08 15:29:10 -07:00
Yuneng Jiang
4ee7d42981
[Fix] Restructure HTML files after UI build so extensionless routes work in CI 2026-04-08 13:24:52 -07:00
Yuneng Jiang
ac9ebdf4d8
[Fix] Rename CI job to e2e_ui_testing and remove duplicate old job definition 2026-04-08 13:17:45 -07:00
Yuneng Jiang
a8f4f464ce
[Fix] Add missing test fixtures and address review feedback
- Add constants.ts with all required exports (key aliases, team IDs)
- Add fixtures/users.ts with all role definitions and storage paths
- Add fixtures/seed.sql for deterministic test database seeding
- Remove Firefox project from playwright config (only Chromium installed)
- Remove unused variable in teams.spec.ts
- Rename CircleCI job to e2e_ui_testing
2026-04-08 12:40:41 -07:00
Yuneng Jiang
d09d98a70a
[Feature] E2E UI tests: proxy-admin team and key management with CI integration
Add Playwright E2E tests covering proxy admin team and key management
workflows, with a self-contained test runner and CircleCI integration.

Tests cover: create team, invite user, edit/delete team members, create
key in team, regenerate key, update TPM/RPM limits, delete key, and
verify internal user keys are visible.

Infrastructure: run_e2e.sh builds the UI from source before starting
the proxy, ensuring tests always run against the latest UI changes.
Added data-testid attributes to key UI components for reliable selectors.
2026-04-08 11:51:15 -07:00
Yuneng Jiang
7ba0c69a07
[Fix] Install pytest-rerunfailures in redis caching CircleCI job 2026-04-08 11:50:00 -07:00
Yuneng Jiang
0104b60d8e
[Infra] Add redis_caching_coverage to coverage combine command 2026-04-08 10:48:41 -07:00
Yuneng Jiang
3a02c0ac6b
[Infra] Migrate Redis caching tests from GHA to CircleCI
Redis caching unit tests (test_dual_cache, test_redis_batch_optimizations,
test_router_utils) required Redis secrets that should live in CircleCI.

- Add redis_caching_unit_tests job to CircleCI config
- Delete test-unit-caching-redis.yml GHA workflow
- Remove all Redis plumbing (inputs, secrets, env vars) from
  _test-unit-services-base.yml and its callers
2026-04-08 09:07:12 -07:00
yuneng-jiang
a60e19aeb8
Remove flaky proxy_e2e_azure_batches_tests CI workflow (#25247)
The proxy_e2e_azure_batches_tests workflow is consistently flaky and
does not provide reliable signal on whether changes break anything.
Remove the workflow from both CircleCI and GitHub Actions, along with
the test directory it exclusively used.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:49:14 -07:00
Yuneng Jiang
006d481025
[Fix] Remove neon CLI dependency and pin all JS dependencies
Remove @neondatabase/api-client and neonctl to address CVE-2026-25639
(axios supply chain vulnerability). Pin all JS dependencies to exact
versions across all package.json files to prevent future supply chain
attacks via semver range resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 16:15:32 -07:00
Yuneng Jiang
85f72c9d24
[Fix] Remove unused aioboto3 dependency and botocore conflict workarounds
aioboto3 was listed as a dependency for async sagemaker calls but is not
imported anywhere in the codebase — async calls use httpx + botocore SigV4
instead. Removing it eliminates the unresolvable botocore version conflict
between boto3 and aiobotocore, along with all grep -v / --no-deps workarounds
across Dockerfiles and CI.

Also addresses Greptile review feedback: collapse redundant grpcio
python-version markers, bump pyproject.toml cryptography to 46.0.5 to
match Docker (GHSA-r6ph-v2qm-q3c2), and fix misleading .npmrc comment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 14:25:44 -07:00
Yuneng Jiang
821a634d25
[Fix] Handle boto3/aioboto3 botocore conflict across CI and Docker builds
boto3==1.42.80 and aioboto3==15.5.0 have incompatible botocore version
ranges. No aioboto3 release supports botocore 1.42.x yet. Both uv and
pip 26.0.1 reject the resolution.

Fix: filter aioboto3 out of requirements.txt at install time, then
install aioboto3+aiobotocore with --no-deps to bypass resolution.
Added wrapt and aioitertools to requirements.txt as pinned transitive
deps of aiobotocore (skipped by --no-deps). Fixed pip stdin handling
(/dev/stdin). Applied to all 5 Dockerfiles and all CircleCI install
paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 12:27:21 -07:00
Yuneng Jiang
fc8eb81549
[Fix] Filter aioboto3 from resolver to fix boto3/aioboto3 conflict
boto3==1.42.80 and aioboto3==15.5.0 have incompatible botocore ranges.
Both uv and pip 26.0.1 reject the resolution. Fix: filter aioboto3 out
of requirements.txt at install time, then install aioboto3+aiobotocore
separately with --no-deps to bypass resolution. Removes uv-overrides.txt
which only partially addressed the conflict.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 11:57:07 -07:00
Yuneng Jiang
467abd1909
[Fix] Add uv override for boto3/aioboto3 botocore conflict
boto3==1.42.80 requires botocore>=1.42.80 but aioboto3==15.5.0 (via
aiobotocore==2.25.1) requires botocore<1.40.62. No aioboto3 release
supports botocore 1.42.x yet. pip's lenient resolver handles this for
Docker builds, but uv's strict resolver rejects it in CI. Added
uv-overrides.txt to force botocore to match boto3 during uv installs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 11:34:19 -07:00
Yuneng Jiang
43077af378
[Fix] Sync CircleCI dependency pins with requirements.txt
CircleCI had stale version pins (e.g. boto3==1.36.0, aioboto3==13.4.0) that
conflict with requirements.txt (boto3==1.42.80, aioboto3==15.5.0), causing
uv resolution failures. Updated all mismatched pins across config.yml and
.circleci/requirements.txt to match requirements.txt as the source of truth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 11:27:44 -07:00
Ishaan Jaffer
be553c7204 fix aporia 2026-03-30 21:49:31 -07:00
Krrish Dholakia
cbd6253f9c test: skip chromium/firefox check - TODO: move to a dynamic db 2026-03-30 20:55:27 -07:00
Ishaan Jaffer
f1e7aa9dbb db_migration_disable_update_check 2026-03-30 19:46:18 -07:00
Ishaan Jaffer
3fbe7d1059 prisma_schema_sync 2026-03-30 19:40:25 -07:00
Krrish Dholakia
37440c28b7 test: use dynamic db 2026-03-30 19:33:52 -07:00
Yuneng Jiang
7aec9101f5
[Infra] Remove CircleCI jobs now covered by GitHub Actions
Removes 10 CircleCI jobs that are fully duplicated by GHA workflows:
- caching_unit_tests → test-unit-caching-redis.yml
- litellm_security_tests → test-unit-security.yml
- litellm_proxy_unit_testing_{key_generation,part1,part2} → test-unit-proxy-db.yml + test-unit-proxy-legacy.yml
- litellm_mapped_tests_{llms,core,litellm_core_utils,mcps,integrations} → test-unit-*.yml workflows

Also cleans up upload-coverage requires and workflow entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 19:06:48 -07:00
Yuneng Jiang
ba8455a3be
[Infra] Migrate PyPI publishing from CircleCI to GitHub Actions OIDC
- Add .github/workflows/publish_to_pypi.yml with OIDC trusted publisher
- Remove publish_to_pypi job from .circleci/config.yml
- Zero long-lived tokens, all actions SHA-pinned, build deps version-pinned
2026-03-26 19:02:14 -07:00
Ishaan Jaff
81dadb698a
Ishaan - March 18th changes (#24056)
* add DD Tracing (#24033)

* feat(models): add Azure GPT-5.4 mini and nano variants (#24045)

Add `azure/gpt-5.4-mini` and `azure/gpt-5.4-nano` to the model
database with official pricing from Azure OpenAI:

- GPT-5.4 mini: $0.75/M input, $0.075/M cached, $4.5/M output
- GPT-5.4 nano: $0.20/M input, $0.02/M cached, $1.25/M output

Both models support:
- 1.05M input / 128K output context window
- Chat, batch, and responses endpoints
- Function calling, tools, vision, reasoning
- Prompt caching with automatic tiered pricing

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add new model pricing details for volcengine Doubao-Seed-2.0 series (#23871)

Add entries for volcengine Doubao-Seed-2.0 series

* fix(mcp): support refresh_token grant type in OAuth token endpoint (#23701)

* fix(mcp): support refresh_token grant type in OAuth token endpoint (#23700)

The .well-known/oauth-authorization-server metadata advertises
refresh_token as a supported grant type, but the token endpoint
rejected it with HTTP 400. This adds refresh_token grant support
so MCP clients can refresh expired tokens without re-authenticating.

* test(mcp): add tests for refresh_token grant type in OAuth token endpoint

* fix(mcp): move code_verifier guard into authorization_code branch

code_verifier is only relevant for authorization_code grants (PKCE).
Move it inside the else branch so it doesn't apply to refresh_token.

* fix(mcp): guard None client_secret and forward scope in token exchange

- Conditionally include client_secret in form data to prevent httpx
  from sending the literal string "None" (applies to both
  authorization_code and refresh_token branches)
- Forward optional scope parameter per RFC 6749 §6, allowing clients
  to request a subset of originally-granted scopes on refresh

* fix(mcp): validate code param in authorization_code grant

Guard against None code being form-encoded as literal string "None"
by httpx, symmetric with the existing refresh_token guard.

* docs: add incident report for guardrail logging secret exposure (#24059)

Add blog post documenting the guardrail logging path exposing internal
request data (e.g. Authorization headers) in spend logs and OTEL traces.
Fix available in LiteLLM 1.82.3+.

Made-with: Cursor

* [Fix] Datadog LLM Observability tags format (env, service, version missing) (#23673)

* tag fix

* greptile comment

* fix(ci): stabilize 6 failing CI jobs

1. mypy: remove duplicate type annotation for token_data in discoverable_endpoints.py
2. integrations tests: add parameterized to CI test deps
3. doc quality: document OTEL_IGNORE_CONTEXT_PROPAGATION env key
4. security: allowlist CVE-2026-2673, CVE-2026-3644, CVE-2026-4224 (no fix available)
5. proxy_store_model_in_db: fix missing x-litellm-call-id header on error responses
6. google tests: add --retries 3 for transient Vertex AI rate limits

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(streaming): handle RuntimeError during model_copy in streaming handler

The race condition occurs when model_copy(deep=True) tries to deepcopy
_hidden_params dict while it's being concurrently modified by logging
callbacks. Fall back to shallow copy if the deep copy fails.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(cost): handle non-string traffic_type in cost calculator + add retries

1. Fix AttributeError in _map_traffic_type_to_service_tier when traffic_type
   is an integer (cast to str before calling .upper()). This was causing
   pass-through vertex spend logging to fail silently.
2. Add --retries to llm_translation_testing for flaky external API calls.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: ExMatics HydrogenC <33123710+HydrogenC@users.noreply.github.com>
Co-authored-by: Jack Venberg <jack.venberg@rover.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-03-19 10:20:35 -07:00
yuneng-jiang
278c9babc6
[Infra] Merging RC Branch with Main (#23786)
* fix(test): add missing mocks for test_streamable_http_mcp_handler_mock

The test was missing mocks for extract_mcp_auth_context and set_auth_context,
causing the handler to fail silently in the except block instead of reaching
session_manager.handle_request. This mirrors the fix already applied to the
sibling test_sse_mcp_handler_mock.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): route OpenAI models through chat completions in pass-through tests

The test_anthropic_messages_openai_model_streaming_cost_injection test fails
because the OpenAI Responses API returns 400 for requests routed through the
Anthropic Messages endpoint. Setting LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true
routes OpenAI models through the stable chat completions path instead.
Cost injection still works since it happens at the proxy level.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): fix assemblyai custom auth and router wildcard test flakiness

1. custom_auth_basic.py: Add user_role='proxy_admin' so the custom auth
   user can access management endpoints like /key/generate. The test
   test_assemblyai_transcribe_with_non_admin_key was hidden behind an
   earlier -x failure and was never reached before.

2. test_router_utils.py: Add flaky(retries=3) and increase sleep from 1s
   to 2s for test_router_get_model_group_usage_wildcard_routes. The async
   callback needs time to write usage to cache, and 1s is insufficient on
   slower CI hardware.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* ci: retrigger CI pipeline

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): use LitellmUserRoles enum instead of raw string in custom_auth_basic

Fixes mypy error: Argument 'user_role' has incompatible type 'str'; expected 'LitellmUserRoles | None'

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926)

* fix: don't close HTTP/SDK clients on LLMClientCache eviction

Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:

  RuntimeError: Cannot send a request, as the client has been closed.

This is a regression from commit fb72979432. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().

Fixes production crashes after the 1-hour cache TTL expires.

* test: update LLMClientCache unit tests for no-close-on-eviction behavior

Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.

Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.

* test: add e2e tests for OpenAI SDK client surviving cache eviction

Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
  doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
  eviction doesn't close the client

Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.

Also expand the module docstring to explain why the sleep is required.

* docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction

* docs(CLAUDE.md): add HTTP client cache safety guideline

* [Fix] Install bsdmainutils for column command in security scans

The security_scans.sh script uses `column` to format vulnerability
output, but the package wasn't installed in the CI environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle string callback values in prometheus multiproc setup

When callbacks are configured as a plain string (e.g., `callbacks: "my_callback"`)
instead of a list, the proxy crashes on startup with:
  TypeError: can only concatenate str (not "list") to str

Normalize each callback setting to a list before concatenating.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* bump: version 1.82.2 → 1.82.3

* fix(test): update test_startup_fails_when_db_setup_fails for opt-in enforcement

The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(cost_calculator): use model name for per-request custom pricing when router_model_id has no pricing

When custom pricing is passed as per-request kwargs (input_cost_per_token/output_cost_per_token),
completion() registers pricing under the model name, but _select_model_name_for_cost_calc was
selecting the router deployment hash (which has no pricing data), causing response_cost to be 0.0.

Now checks whether the router_model_id entry actually has pricing before preferring it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 15:32:20 -07:00
yuneng-jiang
9cec81a087 [Fix] Revert proxy unit test groupings to prevent xdist state pollution
Part1 had 4 test files combined (was originally 2), causing cross-file
state pollution under xdist. Reverted to original grouping.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 00:48:56 -07:00
yuneng-jiang
2372427dbc [Fix] Remove xdist from caching_unit_tests to fix GCS cache test failures
GCS cache tests (test_gcs_cache_unit_tests.py) rely on module-level state
(vertex_chat_completion singleton, credential caches) that importlib.reload
resets but the xdist-safe function-scoped fixture does not. Removing -n 4
from this job restores single-process execution where module reload properly
resets all state before each test, while CI-level parallelism (parallelism: 2)
still splits test files across nodes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 00:23:04 -07:00
yuneng-jiang
96183e8bde [Fix] Drop --no-deps from aurelio_sdk in guardrails and enterprise tests
aurelio_sdk imports requests_toolbelt at load time, so it needs its deps.
Unlike semantic_router, aurelio_sdk has no conflict with openai>=2, so
--no-deps is unnecessary. Verified via uv dry-run locally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 23:11:43 -07:00
yuneng-jiang
f68a9be04d [Infra] Optimize CI: migrate litellm_security_tests from machine to docker xlarge
Switch from expensive Linux machine (medium) to docker xlarge executor.
Drop miniconda, manual Docker CLI install, and manual PostgreSQL container
in favor of cimg/python:3.13, setup_remote_docker, and service container.
Use uv + cache for dependency installation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 23:07:22 -07:00
yuneng-jiang
eba54bae11 [Fix] Add aurelio_sdk --no-deps alongside semantic_router in guardrails and enterprise tests
semantic_router imports aurelio_sdk at module load time, so it must be
installed even when using --no-deps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 23:02:35 -07:00
yuneng-jiang
ae1e827319 [Infra] Optimize CI: add xdist to caching tests, drop Docker CLI installs, reduce verbosity
- caching_unit_tests: add resource_class large, enable xdist -n 4, drop unused coverage collection
- build_and_test & proxy_pass_through_endpoint_tests: remove redundant Docker CLI install (machine executor has it)
- Downgrade -vv to -v across 4 jobs to reduce log noise

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 22:41:27 -07:00
yuneng-jiang
f07301a518 [Infra] Optimize CI: right-size resource classes, drop unused coverage, increase xdist workers
Downgrade langfuse, assistants, and python 3.13 install jobs to medium (were defaulting to large at ~25% CPU).
Bump enterprise and image_gen xdist workers to -n 4 on explicit large instances.
Drop coverage collection and persist_to_workspace for 4 jobs that no longer need it.
Downgrade verbosity from -vv to -v across all 5 jobs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 22:27:50 -07:00
yuneng-jiang
65b3335735 [Infra] Use uv for requirements.txt installs across 22 CI jobs
Switch pip install -r requirements.txt to uv pip install --system -r requirements.txt
for all docker-based jobs that use the main requirements.txt. This applies the same
optimization already proven in the mapped test jobs to the rest of the CI pipeline.

Also adds --no-deps to semantic_router installs in guardrails_testing and
litellm_mapped_enterprise_tests to avoid uv's strict resolution conflict with openai>=2.

Skipped: machine executor + conda jobs (security, proxy_spend_accuracy,
proxy_multi_instance, proxy_store_model_in_db) and Group B jobs using
.circleci/requirements.txt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 22:22:44 -07:00
yuneng-jiang
379c3952f4 [Fix] Use uv for requirements.txt only, pip for test deps with conflicting pins
uv's strict resolver rejects transitive dep conflicts (semantic-router
wants openai<2, llm-sandbox wants pydantic>=2.11.5). Use uv for the
heavy requirements.txt install and pip for the small test dep batch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 21:58:07 -07:00
yuneng-jiang
26207bb7be [Infra] Speed up mapped test jobs: uv installs, site-packages caching, drop unused coverage
- Switch setup_litellm_test_deps from pip to uv with batched installs
- Cache installed site-packages (~/.local/lib, ~/.local/bin) instead of
  pip download cache for near-instant installs on cache hit
- Remove unused coverage collection from 6 mapped test jobs (only mcps
  coverage is consumed by the coverage combine step)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 21:50:18 -07:00
yuneng-jiang
1a00dd4dbb Fix router test isolation for xdist and rebalance proxy unit tests
Router tests: expand conftest save/restore to cover all globals mutated
by router tests (default_fallbacks, tag_budget_config, request_timeout,
enable_azure_ad_token_refresh, num_retries_per_request, model_cost,
token_counter). These were leaking across xdist workers.

Proxy tests: move test_proxy_utils.py (169 parametrized) and
test_proxy_server.py (72 parametrized) from part2 to part1, balancing
~370 vs ~360 tests (was ~129 vs ~600).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 21:36:56 -07:00
yuneng-jiang
74e57bdd27 Optimize CI test jobs: increase xdist workers, drop coverage, add caching
Increase pytest-xdist parallelism to match available CPU on I/O-bound and
CPU-bound test jobs. Drop coverage collection from 8 jobs (still collected
by ~15 other jobs). Add dependency caching to 4 uncached jobs. Reduce
verbose output (-vv to -v) and remove -s/--log-cli-level overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 21:20:50 -07:00
yuneng-jiang
c1efbd3c8a [Fix] Drop httpx and opentelemetry pins from local_testing batched installs
Batching pip installs surfaced more hidden conflicts:
- respx==0.22.0 requires httpx>=0.25.0, conflicting with httpx==0.24.1
- traceloop-sdk==0.21.1 requires otel-semantic-conventions<0.46,
  conflicting with opentelemetry-sdk==1.25.0 (needs ==0.46b0)

These were masked before because separate pip install calls let later
installs silently override earlier pins. Dropping the pins lets pip
resolve compatible versions. Verified with pip --dry-run locally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 21:10:25 -07:00
yuneng-jiang
59f0db0538 [Fix] Remove anyio==4.2.0 pin from local_testing batched installs
Batching pip installs exposed a dependency conflict: langfuse==2.59.7
requires anyio>=4.4.0, which conflicts with the anyio==4.2.0 pin.
Dropping the pin lets pip resolve a compatible version.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 21:02:50 -07:00
yuneng-jiang
f3cc292daf [Infra] Speed up CI: batch pip installs and fix pytest -n parallelism
Several test jobs were underutilizing their CPU allocation (~25%) because
they were either missing pytest-xdist -n or using -n 2 on 4-vCPU machines.
Batching individual pip install calls into single commands reduces resolver
overhead and saves ~30-60s per job.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 20:54:46 -07:00
yuneng-jiang
19e8a16cce Optimize logging_testing CI: suppress DEBUG logs, fix xdist isolation
- Add LITELLM_LOG=WARNING to suppress verbose DEBUG log output
- Remove -s flag to stop capturing all stdout
- Bump xdist workers from -n 2 to -n 4
- Add --timeout=120 for safety
- Rewrite conftest.py to use save/restore pattern (matching guardrails_tests)
  instead of per-function importlib.reload + event loop creation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 18:24:57 -07:00
yuneng-jiang
ec537dd973 Add -n 2 parallelism to 8 medium jobs running serial tests
These jobs run on medium (2 CPU) but weren't using pytest-xdist,
leaving the second CPU idle. Added pytest-xdist dep and -n 2 to:
- auth_ui_unit_tests (~33 tests)
- litellm_router_unit_testing (~191 tests)
- mcp_testing (~112 tests)
- llm_responses_api_testing (~80 tests)
- search_testing (~53 tests)
- batches_testing (~45 tests)
- litellm_utils_testing (~205 tests)
- pass_through_unit_testing (~102 tests)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 17:15:54 -07:00
yuneng-jiang
a95cae6b1e Speed up build_docker_database_image: drop Docker upgrade, use zstd
- Use ubuntu-2204:2024.04.1 which ships with a recent Docker, eliminating
  the 1-minute `curl get.docker.com | sh` upgrade step
- Switch image save/load from gzip to zstd -1 -T0 for ~3-5x faster
  compression/decompression, saving ~30s on save and on each downstream load

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 17:09:37 -07:00
yuneng-jiang
4c4246ab4a Downgrade oversized resource classes to match actual workload
- proxy_unit_testing_key_generation: large→medium (serial, 1 test file)
- proxy_unit_testing_part1: large→medium, -n 4→2 (only 2 test files)
- mapped_tests_proxy_part1: xlarge→large, -n 8→4 (~2000 tests, 4 CPUs sufficient)

Saves ~40 credits/min across these 3 jobs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 16:19:43 -07:00
yuneng-jiang
65575f3992 Fix pytest -n worker oversubscription to match available CPUs
8 jobs had -n workers set higher than available vCPUs, causing context
switch overhead and degraded performance. Aligned -n to match resource_class:
- medium (2 CPU): enterprise -n 8→2, image_gen/logging/guardrails -n 4→2
- large (4 CPU): proxy_part1/llms/core/integrations -n 8→4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 15:59:53 -07:00
yuneng-jiang
a66b635a4c Upsize ui_build and ui_unit_tests CI machines for faster feedback
ui_build (medium → medium+) is on the critical path blocking 3 downstream
jobs. ui_unit_tests (medium+ → large, maxForks 3 → 5) targets ~7 min from ~11.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 15:36:59 -07:00