Per Yuneng's feedback, use a single @pytest.mark.vcr marker so one record
sweep populates cassettes for every marked test across all providers,
instead of forcing each test to bind to a hard-coded cassette path.
Changes vs. the initial scaffolding:
- Add 'pytest-recording==0.13.4' on top of vcrpy. Adopt its layout:
cassettes live at 'cassettes/<test_module>/<test_name>.yaml', resolved
automatically. New tests just decorate with '@pytest.mark.vcr' — no
imports or path bookkeeping.
- Move the shared filter/match config into a 'vcr_config' fixture in
'tests/llm_translation/conftest.py' (consumed by pytest-recording for
every marked test in the dir). Drop the standalone 'vcr_config.py'.
- Bulk record / replay via the standard '--record-mode' CLI flag:
'make test-llm-translation-record' now sweeps every '@pytest.mark.vcr'
test under tests/llm_translation in one shot. Optional 'TARGET=' var
scopes to a single file.
- Move existing cassettes to the per-test paths and update the local
in-process Anthropic regenerator to write to the same paths.
- Refresh README + Makefile target docs to match the sweep workflow.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
Live LLM e2e tests have been draining provider billing accounts and going
flaky on outages (LIT-2683). This change introduces vcrpy-backed cassette
replay so CI can exercise the same end-to-end LiteLLM transformation paths
without hitting the live provider:
- Add 'vcrpy==8.1.1' to the dev dependency group.
- New 'tests/llm_translation/vcr_config.py' centralises the VCR config:
filters auth/secret headers and per-request response headers, matches on
method+URI+body, and exposes 'LITELLM_VCR_RECORD_MODE' for re-recording.
- New 'tests/llm_translation/test_anthropic_completion_vcr.py' demonstrates
the pattern with one non-streaming and one streaming Anthropic test that
replay from cassettes shipped under 'cassettes/'.
- New 'tests/llm_translation/cassettes/_record_anthropic_fixtures.py' lets
contributors regenerate the canned Anthropic cassettes against a local
in-process mock (no API key required), and 'cassettes/README.md' documents
the full record/replay/refresh workflow.
- New 'make test-llm-translation-record FILE=...' Makefile target to refresh
cassettes against the live API.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
- Dockerfile: pin the unscoped `brace-expansion@5.0.5` alongside
`@isaacs/brace-expansion@5.0.1`. The scoped package only has 5.0.0
and 5.0.1 published; CVE-2026-33750's fix (5.0.5) is on the unscoped
package which npm also vendors. The override loop now swaps both.
- Revert `black` 26.3.1 -> 24.10.0, `pytest` 9.0.3 -> 8.3.5, and
`pytest-asyncio` 1.3.0 -> 1.2.0. The major-version bumps cause CI
lint (black reformats hundreds of files) and code-quality
(liccheck.ini has no entry for the new versions) failures. Both
CVEs are dev-only; skipping leaves no runtime exposure.
* bump litellm-proxy-extras version to 0.4.67
* bump litellm-proxy-extras pin to 0.4.67 in litellm pyproject
* regenerate uv.lock for litellm-proxy-extras 0.4.67
* bump litellm-enterprise version to 0.1.38
* bump litellm-enterprise pin to 0.1.38 in litellm pyproject
* regenerate uv.lock for litellm-enterprise 0.1.38
All three dependency bumps in this PR resolve on Python 3.10, so there
is no need to jump the floor all the way to 3.11. Also restore the
py3.10-specific lunary==1.4.36 pin that was collapsed when the floor
was temporarily at 3.11.
Now that requires-python starts at 3.11, the "python_version >= '3.9'"
and ">= '3.10'" markers are unconditionally true, and the "< '3.10'"
entries for psycopg, Pillow, pyarrow, langchain, lunary, and pylint can
never resolve. Drop the dead markers and remove the unreachable pins so
the dependency list reflects what actually gets installed.
Bumps orjson, fastapi-sso, and python-multipart to their latest releases
in the proxy extra, and raises the project python floor to 3.11 so the
updated pins can resolve. CI already runs on 3.11 / 3.12 / 3.13 and the
Docker images ship python 3.13, so the floor change aligns the declared
support range with what is actually tested and shipped.
langgraph-prebuilt 1.0.9 imports ExecutionInfo and ServerInfo from
langgraph.runtime, but those symbols are not exported until
langgraph 1.1.0. Our pin of langgraph==1.0.10 allows
langgraph-prebuilt<1.1.0,>=1.0.8, and uv resolves to 1.0.9 (the
latest in range), which breaks at import time in every test that
touches langgraph.prebuilt (e.g. tests/pass_through_tests/test_mcp_routes.py):
ImportError: cannot import name 'ExecutionInfo' from 'langgraph.runtime'
Pinning langgraph-prebuilt to 1.0.8 pairs correctly with
langgraph==1.0.10 and restores the import path.
Noma v1 resolved application_id from user_api_key_alias when no explicit
value was set (PR #16832). Noma v2 (PR #21400) was rewritten from scratch
and this fallback was not ported, causing all requests from shared LiteLLM
instances to appear as a single generic "litellm" application in the Noma
dashboard — breaking per-user traceability.
Fix: after checking dynamic_params and self.application_id, fall back to
user_api_key_alias from litellm_metadata or metadata. This matches the
pattern used by PromptSecurityGuardrail._resolve_key_alias_from_request_data()
and restores the v1 behavior where each API key gets its own application
entry in the Noma dashboard.
Fixes#25794
Co-authored-by: Brendan Smith-Elion <brendan.smith-elion@arcadia.io>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1. exclude-newer: change from absolute "2026-04-10" to relative "3 days".
All pinned deps were published before the 3-day cutoff. Re-locked so
uv lock --check passes in test-mcp.yml and test-linting.yml.
2. test_eager_tiktoken_load: run all 10 env var values in a single
subprocess instead of spawning 10 separate processes. Each cold
import litellm takes ~78s on CI, so the old loop took ~13 min on a
single xdist worker. Now takes ~78s total.
3. proxy-db remaining timeout: increase from 20 to 30 minutes. The
remaining group has 51 test files and was consistently timing out at
71% across all branches (pre-existing issue, not migration-related).
1. Cap requires-python to <3.14 — no deps ship 3.14 wheels yet, and
uv's cross-version resolver fails on the Python 3.14 split.
2. Change exclude-newer from relative "30 days" to absolute "2026-04-10"
so the lockfile stays reproducible. The relative date caused
cryptography==46.0.7 (published April 8) to fall outside the window.
3. Parametrize test_eager_loading_env_var_values instead of looping —
with xdist the 6 subprocess cases can run in parallel instead of all
running sequentially on one worker (~13 min → ~2 min).
Also removed redundant case variants (Yes/YES/On/ON) that test the
same str_to_bool code path.
* build: migrate packaging metadata to uv
* ci: move automation and local tooling to uv
* docker: migrate image builds and runtime setup to uv
* docs: update install and deployment guidance for uv
* chore: align auxiliary scripts and tests with uv
* test: harden test_litellm isolation
* fix: keep release and health check images self-contained
* build: pin uv tooling and health check deps
* test: isolate bedrock image request formatting from suite state
* test: cover sandbox executor requirements flow
* ci: fix circleci no-op command steps
* ci: fix circleci publish workflow parsing
* fix: stabilize remaining uv migration CI checks
* ci: increase matrix test timeout headroom
* fix: restore published docker and license coverage
* fix: restore proxy runtime build parity
* fix: restore proxy extras parity and venv migrations
* ci: persist uv path across circleci steps
* fix: keep psycopg binary in default test env
* docker: preserve prisma cache across stages
* test: run local proxy checks through uv python
* build: restore runtime deps moved into ci
* build: refresh uv lock after upstream merge
* fix: restore module import in test_check_migration after merge
The conflict resolution imported only the function but the test body
references check_migration as a module throughout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching
- Move google-generativeai, Pillow, tenacity back to ci group (they are
lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate uv.lock after removing nodejs-wheel-binaries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): use cache/restore instead of cache to prevent cache poisoning
The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert
The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): disable setup-uv cache in publish workflow
Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(test): remove duplicate verbose_logger mock in test_check_migration
The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): free disk space before Docker build in test-server-root-path
The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update CLAUDE.md with qwen3 tool_calls bug fix instructions (#18922)
* fix(ollama): set finish_reason to "tool_calls" when tool_calls present
When qwen3 models return tool_calls through Ollama, the finish_reason
was incorrectly left as "stop" instead of being set to "tool_calls".
This caused clients to miss the tool_calls in the response.
Added _get_finish_reason helper method following OpenAI provider's
pattern, and fixed both streaming and non-streaming response paths.
Fixes: https://github.com/BerriAI/litellm/issues/18922
* fix(ollama): pass tools directly without model capability check
The previous code tried to check model capability via get_model_info()
which made network calls to localhost:11434. When Ollama is remote,
this fails and falls back to JSON format, breaking tool calling.
Ollama 0.4+ supports native tool calling - let Ollama handle
model capability detection instead of LiteLLM.
Fixes#18922
* fix(ollama): transform tool_calls response to OpenAI format
Ollama returns tool_calls with arguments as dict, but OpenAI format
requires arguments to be a JSON string. Also ensures 'type': 'function'
field is present.
Completes the fix for #18922
* fix(ollama): set finish_reason to "tool_calls" when tool_calls present
Fixes#18922
Two issues addressed:
1. Remove broken model capability check
- get_model_info() fails when Ollama runs on remote server
- Broken fallback triggered JSON prompt injection
- Now passes tools directly - Ollama 0.4+ handles detection
2. Set finish_reason correctly
- Was hardcoded to "stop" even with tool_calls present
- Clients use this to know how to process the response
- Now returns "tool_calls" when tool_calls are in response
Both streaming and non-streaming responses are fixed.
Tests:
- All 14 existing Ollama tests pass
- Added 3 focused tests for the fixes