1. check_code_and_doc_quality: add PLR0915 ignore for sandbox_executor.py
2. auth_ui_unit_tests: add prisma generate step (entrypoint.sh only runs
migration, not client generation)
3. proxy_store_model_in_db_tests: move does_mcp_server_exist outside
`if MCP_AVAILABLE:` so it's importable on Python < 3.10
4. build_and_test: fix datetime.fromisoformat('...Z') on Python 3.9
(Z suffix support was added in 3.11)
5. proxy_logging_guardrails: fix container name typo my-app-2 -> my-app-3
6. upload-coverage: use `uv tool run` instead of `uv run --with` to avoid
resolving the full workspace (which fails for Python 3.14)
setup_litellm_enterprise_pip was running `uv sync --package litellm-enterprise`
which overwrites the shared .venv, stripping out pytest, prisma, and other
dev/test deps. Since litellm-enterprise is a workspace member it is already
installed by the main `uv sync --all-groups --all-extras`. Replace with a
verification-only import check.
Also prefix bare `ruff check` with `uv run --no-sync` since uv does not
auto-activate the venv.
- Replace `uv run --no-sync prisma generate` with `python -m prisma generate`
in proxy_part1, proxy_part2, and enterprise jobs (fixes spawn error)
- Migrate redis_caching_unit_tests from requirements.txt to uv sync
- Migrate e2e_ui_testing from requirements.txt to uv sync, replace bare
prisma/python calls with uv run equivalents
- Bump venv cache keys from v1 to v2 with config.yml checksum to bust
stale caches missing black and other dev dependencies
- Pin `pip==26.0.1` and `uv==0.10.9` in CCI jobs that used unpinned
`pip install uv` (redis_caching_unit_tests, ui_e2e_tests)
- Replace bare `prisma generate` with `uv run --no-sync prisma generate`
in proxy_part1, proxy_part2, and enterprise test jobs
- Remove duplicate `check=True` kwarg in test_basic_python_version.py
that caused TypeError with `_run_uv()` helper
* build: migrate packaging metadata to uv
* ci: move automation and local tooling to uv
* docker: migrate image builds and runtime setup to uv
* docs: update install and deployment guidance for uv
* chore: align auxiliary scripts and tests with uv
* test: harden test_litellm isolation
* fix: keep release and health check images self-contained
* build: pin uv tooling and health check deps
* test: isolate bedrock image request formatting from suite state
* test: cover sandbox executor requirements flow
* ci: fix circleci no-op command steps
* ci: fix circleci publish workflow parsing
* fix: stabilize remaining uv migration CI checks
* ci: increase matrix test timeout headroom
* fix: restore published docker and license coverage
* fix: restore proxy runtime build parity
* fix: restore proxy extras parity and venv migrations
* ci: persist uv path across circleci steps
* fix: keep psycopg binary in default test env
* docker: preserve prisma cache across stages
* test: run local proxy checks through uv python
* build: restore runtime deps moved into ci
* build: refresh uv lock after upstream merge
* fix: restore module import in test_check_migration after merge
The conflict resolution imported only the function but the test body
references check_migration as a module throughout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching
- Move google-generativeai, Pillow, tenacity back to ci group (they are
lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate uv.lock after removing nodejs-wheel-binaries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): use cache/restore instead of cache to prevent cache poisoning
The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert
The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): disable setup-uv cache in publish workflow
Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(test): remove duplicate verbose_logger mock in test_check_migration
The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): free disk space before Docker build in test-server-root-path
The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add constants.ts with all required exports (key aliases, team IDs)
- Add fixtures/users.ts with all role definitions and storage paths
- Add fixtures/seed.sql for deterministic test database seeding
- Remove Firefox project from playwright config (only Chromium installed)
- Remove unused variable in teams.spec.ts
- Rename CircleCI job to e2e_ui_testing
Add Playwright E2E tests covering proxy admin team and key management
workflows, with a self-contained test runner and CircleCI integration.
Tests cover: create team, invite user, edit/delete team members, create
key in team, regenerate key, update TPM/RPM limits, delete key, and
verify internal user keys are visible.
Infrastructure: run_e2e.sh builds the UI from source before starting
the proxy, ensuring tests always run against the latest UI changes.
Added data-testid attributes to key UI components for reliable selectors.
Redis caching unit tests (test_dual_cache, test_redis_batch_optimizations,
test_router_utils) required Redis secrets that should live in CircleCI.
- Add redis_caching_unit_tests job to CircleCI config
- Delete test-unit-caching-redis.yml GHA workflow
- Remove all Redis plumbing (inputs, secrets, env vars) from
_test-unit-services-base.yml and its callers
The proxy_e2e_azure_batches_tests workflow is consistently flaky and
does not provide reliable signal on whether changes break anything.
Remove the workflow from both CircleCI and GitHub Actions, along with
the test directory it exclusively used.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove @neondatabase/api-client and neonctl to address CVE-2026-25639
(axios supply chain vulnerability). Pin all JS dependencies to exact
versions across all package.json files to prevent future supply chain
attacks via semver range resolution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aioboto3 was listed as a dependency for async sagemaker calls but is not
imported anywhere in the codebase — async calls use httpx + botocore SigV4
instead. Removing it eliminates the unresolvable botocore version conflict
between boto3 and aiobotocore, along with all grep -v / --no-deps workarounds
across Dockerfiles and CI.
Also addresses Greptile review feedback: collapse redundant grpcio
python-version markers, bump pyproject.toml cryptography to 46.0.5 to
match Docker (GHSA-r6ph-v2qm-q3c2), and fix misleading .npmrc comment.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
boto3==1.42.80 and aioboto3==15.5.0 have incompatible botocore version
ranges. No aioboto3 release supports botocore 1.42.x yet. Both uv and
pip 26.0.1 reject the resolution.
Fix: filter aioboto3 out of requirements.txt at install time, then
install aioboto3+aiobotocore with --no-deps to bypass resolution.
Added wrapt and aioitertools to requirements.txt as pinned transitive
deps of aiobotocore (skipped by --no-deps). Fixed pip stdin handling
(/dev/stdin). Applied to all 5 Dockerfiles and all CircleCI install
paths.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
boto3==1.42.80 and aioboto3==15.5.0 have incompatible botocore ranges.
Both uv and pip 26.0.1 reject the resolution. Fix: filter aioboto3 out
of requirements.txt at install time, then install aioboto3+aiobotocore
separately with --no-deps to bypass resolution. Removes uv-overrides.txt
which only partially addressed the conflict.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
boto3==1.42.80 requires botocore>=1.42.80 but aioboto3==15.5.0 (via
aiobotocore==2.25.1) requires botocore<1.40.62. No aioboto3 release
supports botocore 1.42.x yet. pip's lenient resolver handles this for
Docker builds, but uv's strict resolver rejects it in CI. Added
uv-overrides.txt to force botocore to match boto3 during uv installs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CircleCI had stale version pins (e.g. boto3==1.36.0, aioboto3==13.4.0) that
conflict with requirements.txt (boto3==1.42.80, aioboto3==15.5.0), causing
uv resolution failures. Updated all mismatched pins across config.yml and
.circleci/requirements.txt to match requirements.txt as the source of truth.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removes 10 CircleCI jobs that are fully duplicated by GHA workflows:
- caching_unit_tests → test-unit-caching-redis.yml
- litellm_security_tests → test-unit-security.yml
- litellm_proxy_unit_testing_{key_generation,part1,part2} → test-unit-proxy-db.yml + test-unit-proxy-legacy.yml
- litellm_mapped_tests_{llms,core,litellm_core_utils,mcps,integrations} → test-unit-*.yml workflows
Also cleans up upload-coverage requires and workflow entries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* add DD Tracing (#24033)
* feat(models): add Azure GPT-5.4 mini and nano variants (#24045)
Add `azure/gpt-5.4-mini` and `azure/gpt-5.4-nano` to the model
database with official pricing from Azure OpenAI:
- GPT-5.4 mini: $0.75/M input, $0.075/M cached, $4.5/M output
- GPT-5.4 nano: $0.20/M input, $0.02/M cached, $1.25/M output
Both models support:
- 1.05M input / 128K output context window
- Chat, batch, and responses endpoints
- Function calling, tools, vision, reasoning
- Prompt caching with automatic tiered pricing
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add new model pricing details for volcengine Doubao-Seed-2.0 series (#23871)
Add entries for volcengine Doubao-Seed-2.0 series
* fix(mcp): support refresh_token grant type in OAuth token endpoint (#23701)
* fix(mcp): support refresh_token grant type in OAuth token endpoint (#23700)
The .well-known/oauth-authorization-server metadata advertises
refresh_token as a supported grant type, but the token endpoint
rejected it with HTTP 400. This adds refresh_token grant support
so MCP clients can refresh expired tokens without re-authenticating.
* test(mcp): add tests for refresh_token grant type in OAuth token endpoint
* fix(mcp): move code_verifier guard into authorization_code branch
code_verifier is only relevant for authorization_code grants (PKCE).
Move it inside the else branch so it doesn't apply to refresh_token.
* fix(mcp): guard None client_secret and forward scope in token exchange
- Conditionally include client_secret in form data to prevent httpx
from sending the literal string "None" (applies to both
authorization_code and refresh_token branches)
- Forward optional scope parameter per RFC 6749 §6, allowing clients
to request a subset of originally-granted scopes on refresh
* fix(mcp): validate code param in authorization_code grant
Guard against None code being form-encoded as literal string "None"
by httpx, symmetric with the existing refresh_token guard.
* docs: add incident report for guardrail logging secret exposure (#24059)
Add blog post documenting the guardrail logging path exposing internal
request data (e.g. Authorization headers) in spend logs and OTEL traces.
Fix available in LiteLLM 1.82.3+.
Made-with: Cursor
* [Fix] Datadog LLM Observability tags format (env, service, version missing) (#23673)
* tag fix
* greptile comment
* fix(ci): stabilize 6 failing CI jobs
1. mypy: remove duplicate type annotation for token_data in discoverable_endpoints.py
2. integrations tests: add parameterized to CI test deps
3. doc quality: document OTEL_IGNORE_CONTEXT_PROPAGATION env key
4. security: allowlist CVE-2026-2673, CVE-2026-3644, CVE-2026-4224 (no fix available)
5. proxy_store_model_in_db: fix missing x-litellm-call-id header on error responses
6. google tests: add --retries 3 for transient Vertex AI rate limits
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(streaming): handle RuntimeError during model_copy in streaming handler
The race condition occurs when model_copy(deep=True) tries to deepcopy
_hidden_params dict while it's being concurrently modified by logging
callbacks. Fall back to shallow copy if the deep copy fails.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(cost): handle non-string traffic_type in cost calculator + add retries
1. Fix AttributeError in _map_traffic_type_to_service_tier when traffic_type
is an integer (cast to str before calling .upper()). This was causing
pass-through vertex spend logging to fail silently.
2. Add --retries to llm_translation_testing for flaky external API calls.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
---------
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: ExMatics HydrogenC <33123710+HydrogenC@users.noreply.github.com>
Co-authored-by: Jack Venberg <jack.venberg@rover.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(test): add missing mocks for test_streamable_http_mcp_handler_mock
The test was missing mocks for extract_mcp_auth_context and set_auth_context,
causing the handler to fail silently in the except block instead of reaching
session_manager.handle_request. This mirrors the fix already applied to the
sibling test_sse_mcp_handler_mock.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): route OpenAI models through chat completions in pass-through tests
The test_anthropic_messages_openai_model_streaming_cost_injection test fails
because the OpenAI Responses API returns 400 for requests routed through the
Anthropic Messages endpoint. Setting LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true
routes OpenAI models through the stable chat completions path instead.
Cost injection still works since it happens at the proxy level.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): fix assemblyai custom auth and router wildcard test flakiness
1. custom_auth_basic.py: Add user_role='proxy_admin' so the custom auth
user can access management endpoints like /key/generate. The test
test_assemblyai_transcribe_with_non_admin_key was hidden behind an
earlier -x failure and was never reached before.
2. test_router_utils.py: Add flaky(retries=3) and increase sleep from 1s
to 2s for test_router_get_model_group_usage_wildcard_routes. The async
callback needs time to write usage to cache, and 1s is insufficient on
slower CI hardware.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* ci: retrigger CI pipeline
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(mypy): use LitellmUserRoles enum instead of raw string in custom_auth_basic
Fixes mypy error: Argument 'user_role' has incompatible type 'str'; expected 'LitellmUserRoles | None'
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926)
* fix: don't close HTTP/SDK clients on LLMClientCache eviction
Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:
RuntimeError: Cannot send a request, as the client has been closed.
This is a regression from commit fb72979432. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().
Fixes production crashes after the 1-hour cache TTL expires.
* test: update LLMClientCache unit tests for no-close-on-eviction behavior
Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.
Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.
* test: add e2e tests for OpenAI SDK client surviving cache eviction
Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
eviction doesn't close the client
Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.
Also expand the module docstring to explain why the sleep is required.
* docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction
* docs(CLAUDE.md): add HTTP client cache safety guideline
* [Fix] Install bsdmainutils for column command in security scans
The security_scans.sh script uses `column` to format vulnerability
output, but the package wasn't installed in the CI environment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle string callback values in prometheus multiproc setup
When callbacks are configured as a plain string (e.g., `callbacks: "my_callback"`)
instead of a list, the proxy crashes on startup with:
TypeError: can only concatenate str (not "list") to str
Normalize each callback setting to a list before concatenating.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* bump: version 1.82.2 → 1.82.3
* fix(test): update test_startup_fails_when_db_setup_fails for opt-in enforcement
The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(cost_calculator): use model name for per-request custom pricing when router_model_id has no pricing
When custom pricing is passed as per-request kwargs (input_cost_per_token/output_cost_per_token),
completion() registers pricing under the model name, but _select_model_name_for_cost_calc was
selecting the router deployment hash (which has no pricing data), causing response_cost to be 0.0.
Now checks whether the router_model_id entry actually has pricing before preferring it.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Part1 had 4 test files combined (was originally 2), causing cross-file
state pollution under xdist. Reverted to original grouping.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GCS cache tests (test_gcs_cache_unit_tests.py) rely on module-level state
(vertex_chat_completion singleton, credential caches) that importlib.reload
resets but the xdist-safe function-scoped fixture does not. Removing -n 4
from this job restores single-process execution where module reload properly
resets all state before each test, while CI-level parallelism (parallelism: 2)
still splits test files across nodes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
aurelio_sdk imports requests_toolbelt at load time, so it needs its deps.
Unlike semantic_router, aurelio_sdk has no conflict with openai>=2, so
--no-deps is unnecessary. Verified via uv dry-run locally.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch from expensive Linux machine (medium) to docker xlarge executor.
Drop miniconda, manual Docker CLI install, and manual PostgreSQL container
in favor of cimg/python:3.13, setup_remote_docker, and service container.
Use uv + cache for dependency installation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
semantic_router imports aurelio_sdk at module load time, so it must be
installed even when using --no-deps.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Downgrade langfuse, assistants, and python 3.13 install jobs to medium (were defaulting to large at ~25% CPU).
Bump enterprise and image_gen xdist workers to -n 4 on explicit large instances.
Drop coverage collection and persist_to_workspace for 4 jobs that no longer need it.
Downgrade verbosity from -vv to -v across all 5 jobs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch pip install -r requirements.txt to uv pip install --system -r requirements.txt
for all docker-based jobs that use the main requirements.txt. This applies the same
optimization already proven in the mapped test jobs to the rest of the CI pipeline.
Also adds --no-deps to semantic_router installs in guardrails_testing and
litellm_mapped_enterprise_tests to avoid uv's strict resolution conflict with openai>=2.
Skipped: machine executor + conda jobs (security, proxy_spend_accuracy,
proxy_multi_instance, proxy_store_model_in_db) and Group B jobs using
.circleci/requirements.txt.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
uv's strict resolver rejects transitive dep conflicts (semantic-router
wants openai<2, llm-sandbox wants pydantic>=2.11.5). Use uv for the
heavy requirements.txt install and pip for the small test dep batch.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch setup_litellm_test_deps from pip to uv with batched installs
- Cache installed site-packages (~/.local/lib, ~/.local/bin) instead of
pip download cache for near-instant installs on cache hit
- Remove unused coverage collection from 6 mapped test jobs (only mcps
coverage is consumed by the coverage combine step)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Router tests: expand conftest save/restore to cover all globals mutated
by router tests (default_fallbacks, tag_budget_config, request_timeout,
enable_azure_ad_token_refresh, num_retries_per_request, model_cost,
token_counter). These were leaking across xdist workers.
Proxy tests: move test_proxy_utils.py (169 parametrized) and
test_proxy_server.py (72 parametrized) from part2 to part1, balancing
~370 vs ~360 tests (was ~129 vs ~600).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Increase pytest-xdist parallelism to match available CPU on I/O-bound and
CPU-bound test jobs. Drop coverage collection from 8 jobs (still collected
by ~15 other jobs). Add dependency caching to 4 uncached jobs. Reduce
verbose output (-vv to -v) and remove -s/--log-cli-level overhead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Batching pip installs surfaced more hidden conflicts:
- respx==0.22.0 requires httpx>=0.25.0, conflicting with httpx==0.24.1
- traceloop-sdk==0.21.1 requires otel-semantic-conventions<0.46,
conflicting with opentelemetry-sdk==1.25.0 (needs ==0.46b0)
These were masked before because separate pip install calls let later
installs silently override earlier pins. Dropping the pins lets pip
resolve compatible versions. Verified with pip --dry-run locally.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Batching pip installs exposed a dependency conflict: langfuse==2.59.7
requires anyio>=4.4.0, which conflicts with the anyio==4.2.0 pin.
Dropping the pin lets pip resolve a compatible version.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Several test jobs were underutilizing their CPU allocation (~25%) because
they were either missing pytest-xdist -n or using -n 2 on 4-vCPU machines.
Batching individual pip install calls into single commands reduces resolver
overhead and saves ~30-60s per job.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add LITELLM_LOG=WARNING to suppress verbose DEBUG log output
- Remove -s flag to stop capturing all stdout
- Bump xdist workers from -n 2 to -n 4
- Add --timeout=120 for safety
- Rewrite conftest.py to use save/restore pattern (matching guardrails_tests)
instead of per-function importlib.reload + event loop creation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These jobs run on medium (2 CPU) but weren't using pytest-xdist,
leaving the second CPU idle. Added pytest-xdist dep and -n 2 to:
- auth_ui_unit_tests (~33 tests)
- litellm_router_unit_testing (~191 tests)
- mcp_testing (~112 tests)
- llm_responses_api_testing (~80 tests)
- search_testing (~53 tests)
- batches_testing (~45 tests)
- litellm_utils_testing (~205 tests)
- pass_through_unit_testing (~102 tests)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use ubuntu-2204:2024.04.1 which ships with a recent Docker, eliminating
the 1-minute `curl get.docker.com | sh` upgrade step
- Switch image save/load from gzip to zstd -1 -T0 for ~3-5x faster
compression/decompression, saving ~30s on save and on each downstream load
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8 jobs had -n workers set higher than available vCPUs, causing context
switch overhead and degraded performance. Aligned -n to match resource_class:
- medium (2 CPU): enterprise -n 8→2, image_gen/logging/guardrails -n 4→2
- large (4 CPU): proxy_part1/llms/core/integrations -n 8→4
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ui_build (medium → medium+) is on the critical path blocking 3 downstream
jobs. ui_unit_tests (medium+ → large, maxForks 3 → 5) targets ~7 min from ~11.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>