litellm

Author	SHA1	Message	Date
yuneng-jiang	bac2590b39	build(deps): bump pyjwt to 2.13.0 and ws override to 8.20.1 (#29982 ) Raise the PyJWT floor in pyproject (>=2.13.0,<3.0) and re-resolve uv.lock so the proxy installs 2.13.0 instead of 2.12.0. Bump the ws transitive-version override in the dashboard from 8.19.0 to 8.20.1 and regenerate package-lock; jsdom and openai both dedupe onto the single 8.20.1 copy. Both are routine dependency maintenance bumps to keep pinned versions current.	2026-06-08 16:39:21 -07:00
yuneng-jiang	f1667b9137	chore(deps): bump deps (#29860 ) * bump: version 0.4.73 → 0.4.74 * bump: version 1.88.0 → 1.89.0 * uv lock	2026-06-06 21:44:54 +00:00
yuneng-jiang	28c0d8579b	chore(deps): bump deps (#29373 ) * bump: version 0.1.41 → 0.1.42 * uv lock	2026-05-30 20:41:23 -07:00
Yassin Kortam	d82eb33a60	feat(otel): typed semconv-aligned OpenTelemetry instrumentation (#28909 )	2026-05-29 23:15:27 -07:00
yuneng-jiang	ffc113b428	chore(ci): bump version (#29242 ) * bump: version 1.87.0 → 1.88.0 * uv lock	2026-05-28 18:49:04 -07:00
yuneng-jiang	5e2d75d75d	bump deps (#29208 ) (#29226 ) * fix(deps): bump vulnerable proxy dependencies (starlette/fastapi, granian, pyarrow, semantic-router) Resolve known CVEs flagged by osv-scanner/grype against uv.lock. All bumped versions verified to resolve, install, and pass the proxy auth/route/middleware unit suites (717 tests) plus an import smoke on the new stack. - starlette 0.50.0 -> 1.1.0 (CVE-2026-48710 "BadHost", GHSA-86qp-5c8j-p5mr): versions <1.0.1 reconstruct request.url from the unvalidated Host header, poisoning request.url.path. Required raising fastapi 0.124.4 -> 0.136.3, which dropped fastapi's starlette<0.51.0 cap; an explicit starlette>=1.0.1 floor blocks regression to a vulnerable transitive resolution. The proxy's own auth already reads scope["path"] via get_request_route, but the locked starlette still flagged in container scanners and left other request.url consumers exposed. - granian 2.5.7 -> 2.7.4 (CVE-2026-42544, unauthenticated DoS via WebSocket subprotocol header panic; CVE-2026-42545, WSGI response-header-panic DoS). granian is a selectable proxy server (proxy_cli). - pyarrow 22.0.0 -> 23.0.1 (CVE-2026-25087 / PYSEC-2026-113). - semantic-router 0.1.12 -> 0.1.15: 0.1.12 was yanked (CVE-2026-42208 — its unbounded litellm pin could resolve a credential-exfiltrating litellm==1.82.8 wheel). Not fixable by bump: diskcache 5.6.3 (CVE-2025-69872, unsafe pickle deserialization) has no upstream fix and is left pinned; exploiting it requires write access to the local cache directory. Relock side effect: sse-starlette 3.4.2 -> 3.4.4. * deps: relax exact pins in optional extras to compatible ranges The proxy/optional extras exact-pinned every dependency, which (1) forces downstream `pip install litellm[proxy]` consumers into version lockstep and (2) blocks them from pulling transitive security patches without forking — the structural cause behind needing a litellm release to clear the starlette CVE in the previous commit. Convert the ordinary extras deps to `>=current,<next_major` ranges, mirroring the core [project].dependencies style. Reproducibility for litellm's own Docker/CI is unaffected: images install via `uv sync --frozen`, and the lock re-resolves to the identical versions (no locked version changed). Kept exact-pinned: - litellm-proxy-extras, litellm-enterprise — litellm's own sub-packages, versioned in lockstep with the release. - opentelemetry-api/sdk/exporter-otlp — must resolve to matching versions. - grpcio — supply-chain-pinned to a vetted, aged release. Also corrects the stale comment claiming the extras are exact-pinned for Docker reproducibility (the images use the lock, not these pins). * fix(ci): resolve license-check lookup version from the floor for ranged deps check_licenses.py derived the PyPI lookup version with `next(iter(req.specifier))`, which returns an arbitrary specifier clause. For a range like `>=0.12.1,<1.0` it picked the upper bound (`1.0`) — a version that doesn't exist on PyPI — so the license lookup 404'd and the package was flagged as having an unknown license. The previous commit's switch from exact pins to ranges exposed this for soundfile, pyroscope-io, redisvl, diskcache, and mlflow (the ranged deps not already in liccheck.ini's allowlist). Prefer a lower-bound/exact version (a real released version) for the lookup. * fix(proxy): set strict_content_type=False on the FastAPI app Starlette 1.0 / FastAPI 0.13x flipped the default to strict_content_type=True, which refuses to parse a JSON request body when the client omits the Content-Type header. The proxy previously accepted those requests, so the fastapi/starlette bump in this PR would silently break clients that don't send a Content-Type. Restore the prior lenient behavior explicitly. Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>	2026-05-28 16:48:14 -07:00
harish-berri	d04373f4ce	Add granian as a ASGI compliant web server. Provider better throughput stability, (#26027 ) * Add granian as a ASGI compliant web server. Provides better stability, 10-20 RPS improvement under standard LT conditions. TODO: Verify poetry lock details and add locust numbers to PR * Update granian version in license_cache.json and pyproject.toml to 2.5.7 * Enhance proxy CLI tests by adding SSL initialization checks for Granian server. Remove Python version skip conditions and implement tests to ensure SSL certificate and key are required for server initialization. * update uv lock to fix granian import error	2026-05-21 19:08:37 -07:00
yuneng-jiang	2a5dfcd5bc	build(deps-dev): bump black to 26.3.1 and apply formatting (#28525 ) * build(deps-dev): bump black 24.10.0 -> 26.3.1 * style: apply black 26.3.1 formatting * chore: authorize black 26.3.1 license in liccheck.ini	2026-05-21 17:24:18 -07:00
yuneng-jiang	1480ec698b	chore(ci): bump versions (#28287 ) * bump: version 0.4.72 → 0.4.73 * bump: version 1.86.0 → 1.87.0 * uv lock	2026-05-19 15:10:37 -07:00
yuneng-jiang	cf9b5e4fa7	[Infra] Bump versions (#28094 ) * bump: version 0.1.40 → 0.1.41 * bump: version 1.85.0 → 1.86.0 * add uv lock	2026-05-16 18:31:43 -07:00
yuneng-jiang	fbb39ef94d	build(deps): pin openai==2.33.0 in uv.lock (#28088 ) openai 2.34.0 began rejecting an explicitly-passed empty-string api_key at client construction (raises OpenAIError before any request), which broke tests/local_testing/test_exceptions.py::test_exception_with_headers and related cases after uv.lock floated openai 2.33.0 -> 2.36.0. Pin back to 2.33.0 (within the existing pyproject >=2.20.0,<3.0.0 range) as a temporary stopgap; longer-term fix to follow.	2026-05-16 14:49:31 -07:00
Yassin Kortam	014cb8fa9d	feat: add componentized proxy deployment with gateway, backend, ui, and migrations (#27557 ) Split the monolithic LiteLLM proxy into independently scalable Kubernetes components to allow separate horizontal scaling of the LLM data plane and management API surfaces - Add DatabaseURLSettings pydantic-settings model that assembles DATABASE_URL (and optional DATABASE_URL_READ_REPLICA) from discrete DATABASE_* env vars before Prisma initializes, supporting both IAM token auth (minting short-lived RDS tokens) and password auth; replaces the CLI-only path that componentized entrypoints bypass - Add gateway component (port 4000) that trims the proxy route table to the LLM data-plane surface (chat, embeddings, completions, audio, realtime, provider passthroughs, health/metrics) via an allowlist applied inside the lifespan context so plugin-registered routes are captured - Add backend component (port 4001) that exposes the management/admin surface (keys, users, teams, orgs, spend analytics, model management, SSO, audit logs) with a complementary allowlist - Add ui component — Next.js static export served by nginx (port 3000) with RSC payload routing, asset prefix aliasing, and SPA fallback for dashboard routes - Add migrations component with dedicated Dockerfile that runs prisma migrate deploy via a Helm pre-install/pre-upgrade Job, eliminating per-pod schema contention on the Prisma advisory lock - Add Helm chart (helm/litellm) with separate Deployments, Services, HPAs, and ConfigMap for each component; shared _helpers.tpl emits DATABASE_, IAM_TOKEN_DB_AUTH, REDIS_, and DISABLE_SCHEMA_UPDATE env vars from chart values; ingress template routes traffic to the correct component by path prefix - Add comprehensive tests for DatabaseURLSettings covering IAM auth, password auth, read replica fallbacks, operator-pinned URL preservation, and percent-encoding; add coverage test asserting gateway + backend allowlist union equals the full proxy route set - Add pydantic-settings>=2.14.1 as a proxy extra dependency and update liccheck allowlist Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-16 09:25:17 -07:00
Yuneng Jiang	e838a40049	uv lock	2026-05-13 21:51:29 -07:00
Yuneng Jiang	8686001b3b	build(packaging): raise jinja2 floor to 3.1.6 Our `uv.lock` already resolves jinja2 to 3.1.6, so Docker / CI installs get that version. The `pyproject.toml` floor was lagging at 3.1.0, which means downstream consumers using `--resolution=lowest-direct` or older constraint files can land on 3.1.0-3.1.5 instead of the version we actually test against. Aligns the declared floor with the resolved version so external installers see the same baseline our test matrix exercises. `uv lock` diff is metadata-only (no resolved-version drift).	2026-05-09 13:50:22 -07:00
Yuneng Jiang	086a23753e	uv lock	2026-05-07 16:30:15 -07:00
Yuneng Jiang	9ae9b81c1b	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/nifty-kilby-82870d # Conflicts: # uv.lock	2026-05-07 16:10:22 -07:00
Sameer Kankute	e912e6d4ff	feat(audio_transcription): add NVIDIA Riva STT provider (#27185 ) * feat(audio_transcription): add NVIDIA Riva STT provider Adds nvidia_riva as a new audio transcription provider, supporting both NVCF-hosted and self-hosted Riva ASR deployments via gRPC streaming. - Auto-resamples input audio to 16 kHz mono LINEAR_PCM (soundfile + numpy, audioread fallback) so callers can send any common format. - Maps OpenAI params: language (en -> en-US), response_format (text/json/ verbose_json), timestamp_granularities=["word"] -> enable_word_time_offsets, word offsets converted ms -> s for verbose_json. - Auth: NVCF when nvcf_function_id is set (SSL on by default), self-hosted otherwise (SSL off by default), with explicit use_ssl override. - gRPC errors wrapped via NvidiaRivaException -> litellm exception classes. - Optional deps gated behind [stt-nvidia-riva] extra (nvidia-riva-client, soundfile, audioread, numpy). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(nvidia_riva): address PR review feedback - handler: forward call-level `timeout` to streaming_response_generator (kwarg-detected via inspect for older riva-client compat) so a stalled Riva server cannot block the caller indefinitely. - audio_utils: spill bytes to a tempfile before audioread.audio_open; most audioread backends (FFmpeg, GStreamer) require a real filesystem path and previously raised TypeError on BytesIO, breaking the mp3/m4a fallback path. - audio_utils: prefer soxr / scipy.signal.resample_poly for resampling (anti-aliased polyphase) when installed, falling back to linear only as a last resort. Avoids aliasing on 44.1/48 kHz -> 16 kHz downsamples. - transformation: bare `es` now maps to es-ES (Castilian) instead of es-US, matching BCP-47 conventions. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: trigger CI re-run [stabilize loop 1/3] * Update litellm/llms/nvidia_riva/audio_transcription/transformation.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * chore: trigger CI re-run [stabilize loop 1/3] * fix code qa * fix lint * fix mypy * fix mypy * Fix NVIDIA Riva ASR service lookup * Fix NVIDIA Riva transcription payload logging --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: oss-pr-review-agent-shin[bot] <281797381+oss-pr-review-agent-shin[bot]@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>	2026-05-05 17:17:51 -07:00
Yuneng Jiang	201fa5d42b	[Infra] Packaging: Drop floor-check workflow + bound importlib-metadata Removing the new check-dependency-floors.yml workflow. It only fires when pyproject.toml changes, which is rare; for those PRs, a maintainer can run the same check by hand with one command. Documented that command in a pyproject.toml comment next to the deps. Also adds the missing upper bound on importlib-metadata (>=8.0.0,<9.0) for consistency with every other entry in the list.	2026-05-05 16:51:56 -07:00
yuneng-jiang	e84282b7b3	[Infra] Bump deps (#27157 ) * bump: version 0.4.70 → 0.4.71 * bump: version 0.1.39 → 0.1.40 * uv lock	2026-05-05 15:58:05 -07:00
Yuneng Jiang	eff0f8c630	[Infra] Packaging: Bump compiled-dep floors for cp313 wheel coverage CI matrix on Python 3.13 caught three floors that predate cp313 prebuilt wheels and would force users into a Rust/C build: - tiktoken: 0.7.0 -> 0.8.0 (cp313 wheels start at 0.8) - tokenizers: 0.20.0 -> 0.21.0 (cp313 wheels start at 0.21; sdist's pyproject.toml pre-0.21 is also malformed for modern build backends) - pydantic: 2.5.0 -> 2.10.0 (pydantic-core cp313 wheels start at 2.27, shipped with pydantic 2.10) Verified locally on Python 3.10 and 3.13: install at lowest-direct + import litellm + import every openai-namespace symbol the codebase uses all pass.	2026-05-05 15:48:59 -07:00
Yuneng Jiang	3d55afe38b	[Infra] Packaging: Relax Core Runtime Pins To Ranges The 12 core `[project.dependencies]` entries in pyproject.toml were exact `==` pins, a side effect of the Poetry → uv migration. This forces every downstream package that lists litellm as a dependency to downgrade common runtime libraries (openai, pydantic, aiohttp, click, jsonschema, ...) to the exact versions we ship. Customers have flagged this as a coexistence blocker. Switch to lower-bounded ranges with upper bounds where the upstream package is pre-1.0 or has a known breaking-major-version policy. Reproducibility for our Docker proxy and CI continues to come from `uv.lock`, which is regenerated here as a metadata-only diff (no resolved versions or hashes change). Inspired by #26157 (which got stranded on `litellm_oss_staging_04_21_2026` when the forward-merge to internal staging in #26216 was closed). Floors in this PR are tighter than #26157's: they were validated by installing litellm at `--resolution=lowest-direct` and importing the openai-namespace symbols the codebase actually uses. Floor highlights vs #26157: - openai >= 2.20 (was 2.0) — Responses API symbols + `Omit` need a 2.x mid-range floor - httpx >= 0.28, < 1.0 (was no upper) — pre-1.0 - importlib-metadata >= 8.0 (was 6.0) — stay in tested major - tokenizers >= 0.20, < 1.0 (was 0.19, no upper) — pre-1.0 - aiohttp >= 3.10, < 4.0 (was no upper) — bound major - pydantic >= 2.5, < 3.0 — kept - All other floors: keep tested major, add upper bound Adds a `check-dependency-floors.yml` GitHub Actions workflow that installs litellm at `--resolution=lowest-direct` on Python 3.10 and 3.13 and import-checks every openai symbol the codebase uses, so a future floor regression fails fast in CI rather than silently in the field.	2026-05-05 15:45:13 -07:00
user	bfdd786962	chore(deps): refresh dependency locks	2026-05-04 11:36:18 -07:00
Mateo Wang	05439530c2	Merge branch 'litellm_internal_staging' into litellm_vcr-cassette-llm-tests-af37	2026-05-01 14:37:48 -07:00
Yuneng Jiang	6da13efcec	uv lock	2026-04-30 21:40:09 -07:00
Cursor Agent	05333e42ba	tests(llm_translation): switch to pytest-recording for marker-based bulk capture Per Yuneng's feedback, use a single @pytest.mark.vcr marker so one record sweep populates cassettes for every marked test across all providers, instead of forcing each test to bind to a hard-coded cassette path. Changes vs. the initial scaffolding: - Add 'pytest-recording==0.13.4' on top of vcrpy. Adopt its layout: cassettes live at 'cassettes/<test_module>/<test_name>.yaml', resolved automatically. New tests just decorate with '@pytest.mark.vcr' — no imports or path bookkeeping. - Move the shared filter/match config into a 'vcr_config' fixture in 'tests/llm_translation/conftest.py' (consumed by pytest-recording for every marked test in the dir). Drop the standalone 'vcr_config.py'. - Bulk record / replay via the standard '--record-mode' CLI flag: 'make test-llm-translation-record' now sweeps every '@pytest.mark.vcr' test under tests/llm_translation in one shot. Optional 'TARGET=' var scopes to a single file. - Move existing cassettes to the per-test paths and update the local in-process Anthropic regenerator to write to the same paths. - Refresh README + Makefile target docs to match the sweep workflow. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-04-30 18:08:57 +00:00
Cursor Agent	94b319c577	tests(llm_translation): add VCR cassette infrastructure for offline replay Live LLM e2e tests have been draining provider billing accounts and going flaky on outages (LIT-2683). This change introduces vcrpy-backed cassette replay so CI can exercise the same end-to-end LiteLLM transformation paths without hitting the live provider: - Add 'vcrpy==8.1.1' to the dev dependency group. - New 'tests/llm_translation/vcr_config.py' centralises the VCR config: filters auth/secret headers and per-request response headers, matches on method+URI+body, and exposes 'LITELLM_VCR_RECORD_MODE' for re-recording. - New 'tests/llm_translation/test_anthropic_completion_vcr.py' demonstrates the pattern with one non-streaming and one streaming Anthropic test that replay from cassettes shipped under 'cassettes/'. - New 'tests/llm_translation/cassettes/_record_anthropic_fixtures.py' lets contributors regenerate the canned Anthropic cassettes against a local in-process mock (no API key required), and 'cassettes/README.md' documents the full record/replay/refresh workflow. - New 'make test-llm-translation-record FILE=...' Makefile target to refresh cassettes against the live API. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-04-30 00:45:50 +00:00
Yuneng Jiang	b4d9006f92	uv lock	2026-04-28 17:43:36 -07:00
Yuneng Jiang	a10fff888d	uv lock	2026-04-25 19:32:41 -07:00
user	4d74a30412	chore(deps): fix brace-expansion pin and revert risky dev bumps - Dockerfile: pin the unscoped `brace-expansion@5.0.5` alongside `@isaacs/brace-expansion@5.0.1`. The scoped package only has 5.0.0 and 5.0.1 published; CVE-2026-33750's fix (5.0.5) is on the unscoped package which npm also vendors. The override loop now swaps both. - Revert `black` 26.3.1 -> 24.10.0, `pytest` 9.0.3 -> 8.3.5, and `pytest-asyncio` 1.3.0 -> 1.2.0. The major-version bumps cause CI lint (black reformats hundreds of files) and code-quality (liccheck.ini has no entry for the new versions) failures. Both CVEs are dev-only; skipping leaves no runtime exposure.	2026-04-24 00:37:07 +00:00
user	fed1a14646	chore(deps): bump vulnerable dependencies Closes Nexus IQ policy violations and open Dependabot alerts for shipped Python deps and runtime-stage npm pins in the Docker image.	2026-04-24 00:36:59 +00:00
Yuneng Jiang	ffaeff54cd	add uv	2026-04-23 17:00:20 -07:00
Yuneng Jiang	95fa7678af	uv lock	2026-04-22 18:25:37 -07:00
Yuneng Jiang	e65d547c4d	adding uv lock	2026-04-21 18:10:47 -07:00
ishaan-berri	2f22a1293e	bump litellm-proxy-extras to 0.4.67 (#26043 ) * bump litellm-proxy-extras version to 0.4.67 * bump litellm-proxy-extras pin to 0.4.67 in litellm pyproject * regenerate uv.lock for litellm-proxy-extras 0.4.67 * bump litellm-enterprise version to 0.1.38 * bump litellm-enterprise pin to 0.1.38 in litellm pyproject * regenerate uv.lock for litellm-enterprise 0.1.38	2026-04-18 19:03:56 -07:00
Yuneng Jiang	49ba6b8160	add uv lock	2026-04-18 18:43:09 -07:00
Yuneng Jiang	9bdb3b1772	chore: lower python floor from 3.11 to 3.10 All three dependency bumps in this PR resolve on Python 3.10, so there is no need to jump the floor all the way to 3.11. Also restore the py3.10-specific lunary==1.4.36 pin that was collapsed when the floor was temporarily at 3.11.	2026-04-18 12:50:04 -07:00
Yuneng Jiang	d1e665742b	chore: drop stale python_version markers after floor raise Now that requires-python starts at 3.11, the "python_version >= '3.9'" and ">= '3.10'" markers are unconditionally true, and the "< '3.10'" entries for psycopg, Pillow, pyarrow, langchain, lunary, and pylint can never resolve. Drop the dead markers and remove the unreachable pins so the dependency list reflects what actually gets installed.	2026-04-18 12:31:53 -07:00
Yuneng Jiang	1c29c5e903	chore: bump proxy deps and raise python floor to 3.11 Bumps orjson, fastapi-sso, and python-multipart to their latest releases in the proxy extra, and raises the project python floor to 3.11 so the updated pins can resolve. CI already runs on 3.11 / 3.12 / 3.13 and the Docker images ship python 3.13, so the floor change aligns the declared support range with what is actually tested and shipped.	2026-04-18 12:16:35 -07:00
Ishaan Jaffer	375cfb7f95	chore: update uv.lock after merging main	2026-04-17 12:56:23 -07:00
Yuneng Jiang	c294bbe4f0	fix(deps): pin langgraph-prebuilt==1.0.8 to avoid broken 1.0.9 langgraph-prebuilt 1.0.9 imports ExecutionInfo and ServerInfo from langgraph.runtime, but those symbols are not exported until langgraph 1.1.0. Our pin of langgraph==1.0.10 allows langgraph-prebuilt<1.1.0,>=1.0.8, and uv resolves to 1.0.9 (the latest in range), which breaks at import time in every test that touches langgraph.prebuilt (e.g. tests/pass_through_tests/test_mcp_routes.py): ImportError: cannot import name 'ExecutionInfo' from 'langgraph.runtime' Pinning langgraph-prebuilt to 1.0.8 pairs correctly with langgraph==1.0.10 and restores the import path.	2026-04-16 09:36:05 -07:00
Yuneng Jiang	dafa1bf97c	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_yj_apr15 # Conflicts: # litellm/litellm_core_utils/litellm_logging.py # uv.lock	2026-04-16 09:17:20 -07:00
Brendan Smith-Elion	265a960472	fix(noma-v2): fall back to key_alias for application_id in Noma dashboard (#25795 ) Noma v1 resolved application_id from user_api_key_alias when no explicit value was set (PR #16832). Noma v2 (PR #21400) was rewritten from scratch and this fallback was not ported, causing all requests from shared LiteLLM instances to appear as a single generic "litellm" application in the Noma dashboard — breaking per-user traceability. Fix: after checking dynamic_params and self.application_id, fall back to user_api_key_alias from litellm_metadata or metadata. This matches the pattern used by PromptSecurityGuardrail._resolve_key_alias_from_request_data() and restores the v1 behavior where each API key gets its own application entry in the Noma dashboard. Fixes #25794 Co-authored-by: Brendan Smith-Elion <brendan.smith-elion@arcadia.io> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 19:04:24 +05:30
Ishaan Jaffer	9114b0da96	fix(ci): sync uv.lock with pyproject.toml	2026-04-15 18:16:22 -07:00
jayden	0a1b4427a6	fix(guardrails): replace custom_code sandbox with RestrictedPython	2026-04-15 15:13:52 -07:00
Yuneng Jiang	83c459225c	[Fix] CI: fix GHA timeouts and uv lock --check failures 1. exclude-newer: change from absolute "2026-04-10" to relative "3 days". All pinned deps were published before the 3-day cutoff. Re-locked so uv lock --check passes in test-mcp.yml and test-linting.yml. 2. test_eager_tiktoken_load: run all 10 env var values in a single subprocess instead of spawning 10 separate processes. Each cold import litellm takes ~78s on CI, so the old loop took ~13 min on a single xdist worker. Now takes ~78s total. 3. proxy-db remaining timeout: increase from 20 to 30 minutes. The remaining group has 51 test files and was consistently timing out at 71% across all branches (pre-existing issue, not migration-related).	2026-04-11 09:04:49 -07:00
Yuneng Jiang	d9a460277a	[Fix] CI: fix uv lock resolution and tiktoken test timeout 1. Cap requires-python to <3.14 — no deps ship 3.14 wheels yet, and uv's cross-version resolver fails on the Python 3.14 split. 2. Change exclude-newer from relative "30 days" to absolute "2026-04-10" so the lockfile stays reproducible. The relative date caused cryptography==46.0.7 (published April 8) to fall outside the window. 3. Parametrize test_eager_loading_env_var_values instead of looping — with xdist the 6 subprocess cases can run in parallel instead of all running sequentially on one worker (~13 min → ~2 min). Also removed redundant case variants (Yes/YES/On/ON) that test the same str_to_bool code path.	2026-04-10 22:21:15 -07:00
user	8d1493ed08	fix(security): bump vulnerable dependencies pip: - cryptography 43.0.3 → 46.0.7 (5 CVEs including CVSS 8.2 ECDH key leak) npm: - hono 4.1.4/4.12.7 → 4.12.12 (prototype pollution, cookie injection, path traversal, middleware bypass, IP matching bypass) - @hono/node-server 1.19.6 → 1.19.13 (serveStatic middleware bypass) - vite 7.3.1 → 7.3.2 (file read via WebSocket, path traversal, fs.deny bypass) - lodash override 4.17.23 → 4.18.1 (code injection via _.template, prototype pollution via _.unset/_.omit) mlflow left at 3.9.0 — 2 of 3 alerts have no upstream fix, and 3.11.1 is blocked by exclude-newer (transitive dep chain). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 19:35:19 +00:00
stuxf	a6c30b30bf	build: migrate packaging, CI, and Docker from Poetry to uv (#25007 ) * build: migrate packaging metadata to uv * ci: move automation and local tooling to uv * docker: migrate image builds and runtime setup to uv * docs: update install and deployment guidance for uv * chore: align auxiliary scripts and tests with uv * test: harden test_litellm isolation * fix: keep release and health check images self-contained * build: pin uv tooling and health check deps * test: isolate bedrock image request formatting from suite state * test: cover sandbox executor requirements flow * ci: fix circleci no-op command steps * ci: fix circleci publish workflow parsing * fix: stabilize remaining uv migration CI checks * ci: increase matrix test timeout headroom * fix: restore published docker and license coverage * fix: restore proxy runtime build parity * fix: restore proxy extras parity and venv migrations * ci: persist uv path across circleci steps * fix: keep psycopg binary in default test env * docker: preserve prisma cache across stages * test: run local proxy checks through uv python * build: restore runtime deps moved into ci * build: refresh uv lock after upstream merge * fix: restore module import in test_check_migration after merge The conflict resolution imported only the function but the test body references check_migration as a module throughout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching - Move google-generativeai, Pillow, tenacity back to ci group (they are lazily imported and bloat the base SDK install needlessly) - Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant in Docker where system Node.js is already installed via apk) - Remove all nodejs-wheel node replacement and venv npm patching blocks from Dockerfiles since the wheel is no longer installed - Add --no-default-groups to CodSpeed benchmark workflow so the benchmark environment matches the old minimal pip install footprint - Apply standard uv two-phase Docker pattern: copy metadata first, install deps (cached layer), then copy source and install project - Replace CircleCI enterprise no-op with proper uv sync command Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate uv.lock after removing nodejs-wheel-binaries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): use cache/restore instead of cache to prevent cache poisoning The old workflow used actions/cache/restore (read-only). The uv migration changed it to actions/cache (read-write), which zizmor flags as a cache poisoning risk. Restore the safer read-only variant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert The setup-uv action enables caching by default, which zizmor flags as a cache poisoning risk. Disable it since we already use a read-only cache/restore step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): disable setup-uv cache in publish workflow Silences zizmor cache-poisoning alert. Publishing workflow runs infrequently on protected branches so caching adds no real benefit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(test): remove duplicate verbose_logger mock in test_check_migration The logger was patched twice — first via mocker.patch() then via mocker.patch.object(autospec=True). The second call fails because autospec cannot inspect an already-mocked attribute. Remove the redundant first patch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): free disk space before Docker build in test-server-root-path The Dockerfile.non_root build ran out of disk on the CI runner. Remove Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 11:46:23 -07:00
Ryan Malloy	f76938af5e	fix(ollama): set finish_reason to tool_calls and remove broken capability check (#18924 ) * Update CLAUDE.md with qwen3 tool_calls bug fix instructions (#18922) * fix(ollama): set finish_reason to "tool_calls" when tool_calls present When qwen3 models return tool_calls through Ollama, the finish_reason was incorrectly left as "stop" instead of being set to "tool_calls". This caused clients to miss the tool_calls in the response. Added _get_finish_reason helper method following OpenAI provider's pattern, and fixed both streaming and non-streaming response paths. Fixes: https://github.com/BerriAI/litellm/issues/18922 * fix(ollama): pass tools directly without model capability check The previous code tried to check model capability via get_model_info() which made network calls to localhost:11434. When Ollama is remote, this fails and falls back to JSON format, breaking tool calling. Ollama 0.4+ supports native tool calling - let Ollama handle model capability detection instead of LiteLLM. Fixes #18922 * fix(ollama): transform tool_calls response to OpenAI format Ollama returns tool_calls with arguments as dict, but OpenAI format requires arguments to be a JSON string. Also ensures 'type': 'function' field is present. Completes the fix for #18922 * fix(ollama): set finish_reason to "tool_calls" when tool_calls present Fixes #18922 Two issues addressed: 1. Remove broken model capability check - get_model_info() fails when Ollama runs on remote server - Broken fallback triggered JSON prompt injection - Now passes tools directly - Ollama 0.4+ handles detection 2. Set finish_reason correctly - Was hardcoded to "stop" even with tool_calls present - Clients use this to know how to process the response - Now returns "tool_calls" when tool_calls are in response Both streaming and non-streaming responses are fixed. Tests: - All 14 existing Ollama tests pass - Added 3 focused tests for the fixes	2026-01-14 03:52:26 +05:30

49 Commits