Commit Graph

49 Commits

Author SHA1 Message Date
yuneng-jiang
bac2590b39
build(deps): bump pyjwt to 2.13.0 and ws override to 8.20.1 (#29982)
Raise the PyJWT floor in pyproject (>=2.13.0,<3.0) and re-resolve uv.lock so
the proxy installs 2.13.0 instead of 2.12.0. Bump the ws transitive-version
override in the dashboard from 8.19.0 to 8.20.1 and regenerate package-lock;
jsdom and openai both dedupe onto the single 8.20.1 copy.

Both are routine dependency maintenance bumps to keep pinned versions current.
2026-06-08 16:39:21 -07:00
yuneng-jiang
f1667b9137
chore(deps): bump deps (#29860)
* bump: version 0.4.73 → 0.4.74

* bump: version 1.88.0 → 1.89.0

* uv lock
2026-06-06 21:44:54 +00:00
yuneng-jiang
28c0d8579b
chore(deps): bump deps (#29373)
* bump: version 0.1.41 → 0.1.42

* uv lock
2026-05-30 20:41:23 -07:00
Yassin Kortam
d82eb33a60
feat(otel): typed semconv-aligned OpenTelemetry instrumentation (#28909) 2026-05-29 23:15:27 -07:00
yuneng-jiang
ffc113b428
chore(ci): bump version (#29242)
* bump: version 1.87.0 → 1.88.0

* uv lock
2026-05-28 18:49:04 -07:00
yuneng-jiang
5e2d75d75d
bump deps (#29208) (#29226)
* fix(deps): bump vulnerable proxy dependencies (starlette/fastapi, granian, pyarrow, semantic-router)

Resolve known CVEs flagged by osv-scanner/grype against uv.lock. All bumped
versions verified to resolve, install, and pass the proxy auth/route/middleware
unit suites (717 tests) plus an import smoke on the new stack.

- starlette 0.50.0 -> 1.1.0 (CVE-2026-48710 "BadHost", GHSA-86qp-5c8j-p5mr):
  versions <1.0.1 reconstruct request.url from the unvalidated Host header,
  poisoning request.url.path. Required raising fastapi 0.124.4 -> 0.136.3,
  which dropped fastapi's starlette<0.51.0 cap; an explicit starlette>=1.0.1
  floor blocks regression to a vulnerable transitive resolution. The proxy's
  own auth already reads scope["path"] via get_request_route, but the locked
  starlette still flagged in container scanners and left other request.url
  consumers exposed.
- granian 2.5.7 -> 2.7.4 (CVE-2026-42544, unauthenticated DoS via WebSocket
  subprotocol header panic; CVE-2026-42545, WSGI response-header-panic DoS).
  granian is a selectable proxy server (proxy_cli).
- pyarrow 22.0.0 -> 23.0.1 (CVE-2026-25087 / PYSEC-2026-113).
- semantic-router 0.1.12 -> 0.1.15: 0.1.12 was yanked (CVE-2026-42208 — its
  unbounded litellm pin could resolve a credential-exfiltrating litellm==1.82.8
  wheel).

Not fixable by bump: diskcache 5.6.3 (CVE-2025-69872, unsafe pickle
deserialization) has no upstream fix and is left pinned; exploiting it requires
write access to the local cache directory.

Relock side effect: sse-starlette 3.4.2 -> 3.4.4.

* deps: relax exact pins in optional extras to compatible ranges

The proxy/optional extras exact-pinned every dependency, which (1) forces
downstream `pip install litellm[proxy]` consumers into version lockstep and
(2) blocks them from pulling transitive security patches without forking — the
structural cause behind needing a litellm release to clear the starlette CVE in
the previous commit.

Convert the ordinary extras deps to `>=current,<next_major` ranges, mirroring
the core [project].dependencies style. Reproducibility for litellm's own
Docker/CI is unaffected: images install via `uv sync --frozen`, and the lock
re-resolves to the identical versions (no locked version changed).

Kept exact-pinned:
- litellm-proxy-extras, litellm-enterprise — litellm's own sub-packages,
  versioned in lockstep with the release.
- opentelemetry-api/sdk/exporter-otlp — must resolve to matching versions.
- grpcio — supply-chain-pinned to a vetted, aged release.

Also corrects the stale comment claiming the extras are exact-pinned for Docker
reproducibility (the images use the lock, not these pins).

* fix(ci): resolve license-check lookup version from the floor for ranged deps

check_licenses.py derived the PyPI lookup version with
`next(iter(req.specifier))`, which returns an arbitrary specifier clause. For
a range like `>=0.12.1,<1.0` it picked the upper bound (`1.0`) — a version
that doesn't exist on PyPI — so the license lookup 404'd and the package was
flagged as having an unknown license.

The previous commit's switch from exact pins to ranges exposed this for
soundfile, pyroscope-io, redisvl, diskcache, and mlflow (the ranged deps not
already in liccheck.ini's allowlist). Prefer a lower-bound/exact version (a
real released version) for the lookup.

* fix(proxy): set strict_content_type=False on the FastAPI app

Starlette 1.0 / FastAPI 0.13x flipped the default to strict_content_type=True,
which refuses to parse a JSON request body when the client omits the
Content-Type header. The proxy previously accepted those requests, so the
fastapi/starlette bump in this PR would silently break clients that don't send
a Content-Type. Restore the prior lenient behavior explicitly.

Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
2026-05-28 16:48:14 -07:00
harish-berri
d04373f4ce
Add granian as a ASGI compliant web server. Provider better throughput stability, (#26027)
* Add granian as a ASGI compliant web server. Provides better stability, 10-20 RPS improvement under standard LT conditions.

TODO: Verify poetry lock details and add locust numbers to PR

* Update granian version in license_cache.json and pyproject.toml to 2.5.7

* Enhance proxy CLI tests by adding SSL initialization checks for Granian server. Remove Python version skip conditions and implement tests to ensure SSL certificate and key are required for server initialization.

* update uv lock to fix granian import error
2026-05-21 19:08:37 -07:00
yuneng-jiang
2a5dfcd5bc
build(deps-dev): bump black to 26.3.1 and apply formatting (#28525)
* build(deps-dev): bump black 24.10.0 -> 26.3.1

* style: apply black 26.3.1 formatting

* chore: authorize black 26.3.1 license in liccheck.ini
2026-05-21 17:24:18 -07:00
yuneng-jiang
1480ec698b
chore(ci): bump versions (#28287)
* bump: version 0.4.72 → 0.4.73

* bump: version 1.86.0 → 1.87.0

* uv lock
2026-05-19 15:10:37 -07:00
yuneng-jiang
cf9b5e4fa7
[Infra] Bump versions (#28094)
* bump: version 0.1.40 → 0.1.41

* bump: version 1.85.0 → 1.86.0

* add uv lock
2026-05-16 18:31:43 -07:00
yuneng-jiang
fbb39ef94d
build(deps): pin openai==2.33.0 in uv.lock (#28088)
openai 2.34.0 began rejecting an explicitly-passed empty-string api_key
at client construction (raises OpenAIError before any request), which
broke tests/local_testing/test_exceptions.py::test_exception_with_headers
and related cases after uv.lock floated openai 2.33.0 -> 2.36.0.

Pin back to 2.33.0 (within the existing pyproject >=2.20.0,<3.0.0 range)
as a temporary stopgap; longer-term fix to follow.
2026-05-16 14:49:31 -07:00
Yassin Kortam
014cb8fa9d
feat: add componentized proxy deployment with gateway, backend, ui, and migrations (#27557)
Split the monolithic LiteLLM proxy into independently scalable Kubernetes components to allow separate horizontal scaling of the LLM data plane and management API surfaces

- Add DatabaseURLSettings pydantic-settings model that assembles DATABASE_URL (and optional DATABASE_URL_READ_REPLICA) from discrete DATABASE_* env vars before Prisma initializes, supporting both IAM token auth (minting short-lived RDS tokens) and password auth; replaces the CLI-only path that componentized entrypoints bypass
- Add gateway component (port 4000) that trims the proxy route table to the LLM data-plane surface (chat, embeddings, completions, audio, realtime, provider passthroughs, health/metrics) via an allowlist applied inside the lifespan context so plugin-registered routes are captured
- Add backend component (port 4001) that exposes the management/admin surface (keys, users, teams, orgs, spend analytics, model management, SSO, audit logs) with a complementary allowlist
- Add ui component — Next.js static export served by nginx (port 3000) with RSC payload routing, asset prefix aliasing, and SPA fallback for dashboard routes
- Add migrations component with dedicated Dockerfile that runs prisma migrate deploy via a Helm pre-install/pre-upgrade Job, eliminating per-pod schema contention on the Prisma advisory lock
- Add Helm chart (helm/litellm) with separate Deployments, Services, HPAs, and ConfigMap for each component; shared _helpers.tpl emits DATABASE_*, IAM_TOKEN_DB_AUTH, REDIS_*, and DISABLE_SCHEMA_UPDATE env vars from chart values; ingress template routes traffic to the correct component by path prefix
- Add comprehensive tests for DatabaseURLSettings covering IAM auth, password auth, read replica fallbacks, operator-pinned URL preservation, and percent-encoding; add coverage test asserting gateway + backend allowlist union equals the full proxy route set
- Add pydantic-settings>=2.14.1 as a proxy extra dependency and update liccheck allowlist

Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-16 09:25:17 -07:00
Yuneng Jiang
e838a40049
uv lock 2026-05-13 21:51:29 -07:00
Yuneng Jiang
8686001b3b
build(packaging): raise jinja2 floor to 3.1.6
Our `uv.lock` already resolves jinja2 to 3.1.6, so Docker / CI installs
get that version. The `pyproject.toml` floor was lagging at 3.1.0,
which means downstream consumers using `--resolution=lowest-direct` or
older constraint files can land on 3.1.0-3.1.5 instead of the version
we actually test against.

Aligns the declared floor with the resolved version so external
installers see the same baseline our test matrix exercises.

`uv lock` diff is metadata-only (no resolved-version drift).
2026-05-09 13:50:22 -07:00
Yuneng Jiang
086a23753e
uv lock 2026-05-07 16:30:15 -07:00
Yuneng Jiang
9ae9b81c1b
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/nifty-kilby-82870d
# Conflicts:
#	uv.lock
2026-05-07 16:10:22 -07:00
Sameer Kankute
e912e6d4ff
feat(audio_transcription): add NVIDIA Riva STT provider (#27185)
* feat(audio_transcription): add NVIDIA Riva STT provider

Adds nvidia_riva as a new audio transcription provider, supporting both
NVCF-hosted and self-hosted Riva ASR deployments via gRPC streaming.

- Auto-resamples input audio to 16 kHz mono LINEAR_PCM (soundfile + numpy,
  audioread fallback) so callers can send any common format.
- Maps OpenAI params: language (en -> en-US), response_format (text/json/
  verbose_json), timestamp_granularities=["word"] -> enable_word_time_offsets,
  word offsets converted ms -> s for verbose_json.
- Auth: NVCF when nvcf_function_id is set (SSL on by default), self-hosted
  otherwise (SSL off by default), with explicit use_ssl override.
- gRPC errors wrapped via NvidiaRivaException -> litellm exception classes.
- Optional deps gated behind [stt-nvidia-riva] extra (nvidia-riva-client,
  soundfile, audioread, numpy).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(nvidia_riva): address PR review feedback

- handler: forward call-level `timeout` to streaming_response_generator
  (kwarg-detected via inspect for older riva-client compat) so a stalled
  Riva server cannot block the caller indefinitely.
- audio_utils: spill bytes to a tempfile before audioread.audio_open;
  most audioread backends (FFmpeg, GStreamer) require a real filesystem
  path and previously raised TypeError on BytesIO, breaking the mp3/m4a
  fallback path.
- audio_utils: prefer soxr / scipy.signal.resample_poly for resampling
  (anti-aliased polyphase) when installed, falling back to linear only
  as a last resort. Avoids aliasing on 44.1/48 kHz -> 16 kHz downsamples.
- transformation: bare `es` now maps to es-ES (Castilian) instead of
  es-US, matching BCP-47 conventions.

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore: trigger CI re-run [stabilize loop 1/3]

* Update litellm/llms/nvidia_riva/audio_transcription/transformation.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* chore: trigger CI re-run [stabilize loop 1/3]

* fix code qa

* fix lint

* fix mypy

* fix mypy

* Fix NVIDIA Riva ASR service lookup

* Fix NVIDIA Riva transcription payload logging

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: oss-pr-review-agent-shin[bot] <281797381+oss-pr-review-agent-shin[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
2026-05-05 17:17:51 -07:00
Yuneng Jiang
201fa5d42b
[Infra] Packaging: Drop floor-check workflow + bound importlib-metadata
Removing the new check-dependency-floors.yml workflow. It only fires when
pyproject.toml changes, which is rare; for those PRs, a maintainer can
run the same check by hand with one command. Documented that command in
a pyproject.toml comment next to the deps.

Also adds the missing upper bound on importlib-metadata (>=8.0.0,<9.0)
for consistency with every other entry in the list.
2026-05-05 16:51:56 -07:00
yuneng-jiang
e84282b7b3
[Infra] Bump deps (#27157)
* bump: version 0.4.70 → 0.4.71

* bump: version 0.1.39 → 0.1.40

* uv lock
2026-05-05 15:58:05 -07:00
Yuneng Jiang
eff0f8c630
[Infra] Packaging: Bump compiled-dep floors for cp313 wheel coverage
CI matrix on Python 3.13 caught three floors that predate cp313 prebuilt
wheels and would force users into a Rust/C build:

- tiktoken: 0.7.0 -> 0.8.0 (cp313 wheels start at 0.8)
- tokenizers: 0.20.0 -> 0.21.0 (cp313 wheels start at 0.21; sdist's
  pyproject.toml pre-0.21 is also malformed for modern build backends)
- pydantic: 2.5.0 -> 2.10.0 (pydantic-core cp313 wheels start at 2.27,
  shipped with pydantic 2.10)

Verified locally on Python 3.10 and 3.13: install at lowest-direct +
import litellm + import every openai-namespace symbol the codebase uses
all pass.
2026-05-05 15:48:59 -07:00
Yuneng Jiang
3d55afe38b
[Infra] Packaging: Relax Core Runtime Pins To Ranges
The 12 core `[project.dependencies]` entries in pyproject.toml were exact
`==` pins, a side effect of the Poetry → uv migration. This forces every
downstream package that lists litellm as a dependency to downgrade common
runtime libraries (openai, pydantic, aiohttp, click, jsonschema, ...) to
the exact versions we ship. Customers have flagged this as a coexistence
blocker.

Switch to lower-bounded ranges with upper bounds where the upstream
package is pre-1.0 or has a known breaking-major-version policy.
Reproducibility for our Docker proxy and CI continues to come from
`uv.lock`, which is regenerated here as a metadata-only diff (no
resolved versions or hashes change).

Inspired by #26157 (which got stranded on `litellm_oss_staging_04_21_2026`
when the forward-merge to internal staging in #26216 was closed). Floors
in this PR are tighter than #26157's: they were validated by installing
litellm at `--resolution=lowest-direct` and importing the openai-namespace
symbols the codebase actually uses.

Floor highlights vs #26157:
- openai >= 2.20 (was 2.0) — Responses API symbols + `Omit` need a 2.x mid-range floor
- httpx >= 0.28, < 1.0 (was no upper) — pre-1.0
- importlib-metadata >= 8.0 (was 6.0) — stay in tested major
- tokenizers >= 0.20, < 1.0 (was 0.19, no upper) — pre-1.0
- aiohttp >= 3.10, < 4.0 (was no upper) — bound major
- pydantic >= 2.5, < 3.0 — kept
- All other floors: keep tested major, add upper bound

Adds a `check-dependency-floors.yml` GitHub Actions workflow that
installs litellm at `--resolution=lowest-direct` on Python 3.10 and 3.13
and import-checks every openai symbol the codebase uses, so a future
floor regression fails fast in CI rather than silently in the field.
2026-05-05 15:45:13 -07:00
user
bfdd786962 chore(deps): refresh dependency locks 2026-05-04 11:36:18 -07:00
Mateo Wang
05439530c2
Merge branch 'litellm_internal_staging' into litellm_vcr-cassette-llm-tests-af37 2026-05-01 14:37:48 -07:00
Yuneng Jiang
6da13efcec
uv lock 2026-04-30 21:40:09 -07:00
Cursor Agent
05333e42ba
tests(llm_translation): switch to pytest-recording for marker-based bulk capture
Per Yuneng's feedback, use a single @pytest.mark.vcr marker so one record
sweep populates cassettes for every marked test across all providers,
instead of forcing each test to bind to a hard-coded cassette path.

Changes vs. the initial scaffolding:

- Add 'pytest-recording==0.13.4' on top of vcrpy. Adopt its layout:
  cassettes live at 'cassettes/<test_module>/<test_name>.yaml', resolved
  automatically. New tests just decorate with '@pytest.mark.vcr' — no
  imports or path bookkeeping.
- Move the shared filter/match config into a 'vcr_config' fixture in
  'tests/llm_translation/conftest.py' (consumed by pytest-recording for
  every marked test in the dir). Drop the standalone 'vcr_config.py'.
- Bulk record / replay via the standard '--record-mode' CLI flag:
  'make test-llm-translation-record' now sweeps every '@pytest.mark.vcr'
  test under tests/llm_translation in one shot. Optional 'TARGET=' var
  scopes to a single file.
- Move existing cassettes to the per-test paths and update the local
  in-process Anthropic regenerator to write to the same paths.
- Refresh README + Makefile target docs to match the sweep workflow.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-04-30 18:08:57 +00:00
Cursor Agent
94b319c577
tests(llm_translation): add VCR cassette infrastructure for offline replay
Live LLM e2e tests have been draining provider billing accounts and going
flaky on outages (LIT-2683). This change introduces vcrpy-backed cassette
replay so CI can exercise the same end-to-end LiteLLM transformation paths
without hitting the live provider:

- Add 'vcrpy==8.1.1' to the dev dependency group.
- New 'tests/llm_translation/vcr_config.py' centralises the VCR config:
  filters auth/secret headers and per-request response headers, matches on
  method+URI+body, and exposes 'LITELLM_VCR_RECORD_MODE' for re-recording.
- New 'tests/llm_translation/test_anthropic_completion_vcr.py' demonstrates
  the pattern with one non-streaming and one streaming Anthropic test that
  replay from cassettes shipped under 'cassettes/'.
- New 'tests/llm_translation/cassettes/_record_anthropic_fixtures.py' lets
  contributors regenerate the canned Anthropic cassettes against a local
  in-process mock (no API key required), and 'cassettes/README.md' documents
  the full record/replay/refresh workflow.
- New 'make test-llm-translation-record FILE=...' Makefile target to refresh
  cassettes against the live API.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
2026-04-30 00:45:50 +00:00
Yuneng Jiang
b4d9006f92 uv lock 2026-04-28 17:43:36 -07:00
Yuneng Jiang
a10fff888d
uv lock 2026-04-25 19:32:41 -07:00
user
4d74a30412
chore(deps): fix brace-expansion pin and revert risky dev bumps
- Dockerfile: pin the unscoped `brace-expansion@5.0.5` alongside
  `@isaacs/brace-expansion@5.0.1`. The scoped package only has 5.0.0
  and 5.0.1 published; CVE-2026-33750's fix (5.0.5) is on the unscoped
  package which npm also vendors. The override loop now swaps both.
- Revert `black` 26.3.1 -> 24.10.0, `pytest` 9.0.3 -> 8.3.5, and
  `pytest-asyncio` 1.3.0 -> 1.2.0. The major-version bumps cause CI
  lint (black reformats hundreds of files) and code-quality
  (liccheck.ini has no entry for the new versions) failures. Both
  CVEs are dev-only; skipping leaves no runtime exposure.
2026-04-24 00:37:07 +00:00
user
fed1a14646
chore(deps): bump vulnerable dependencies
Closes Nexus IQ policy violations and open Dependabot alerts for
shipped Python deps and runtime-stage npm pins in the Docker image.
2026-04-24 00:36:59 +00:00
Yuneng Jiang
ffaeff54cd
add uv 2026-04-23 17:00:20 -07:00
Yuneng Jiang
95fa7678af
uv lock 2026-04-22 18:25:37 -07:00
Yuneng Jiang
e65d547c4d
adding uv lock 2026-04-21 18:10:47 -07:00
ishaan-berri
2f22a1293e
bump litellm-proxy-extras to 0.4.67 (#26043)
* bump litellm-proxy-extras version to 0.4.67

* bump litellm-proxy-extras pin to 0.4.67 in litellm pyproject

* regenerate uv.lock for litellm-proxy-extras 0.4.67

* bump litellm-enterprise version to 0.1.38

* bump litellm-enterprise pin to 0.1.38 in litellm pyproject

* regenerate uv.lock for litellm-enterprise 0.1.38
2026-04-18 19:03:56 -07:00
Yuneng Jiang
49ba6b8160
add uv lock 2026-04-18 18:43:09 -07:00
Yuneng Jiang
9bdb3b1772
chore: lower python floor from 3.11 to 3.10
All three dependency bumps in this PR resolve on Python 3.10, so there
is no need to jump the floor all the way to 3.11. Also restore the
py3.10-specific lunary==1.4.36 pin that was collapsed when the floor
was temporarily at 3.11.
2026-04-18 12:50:04 -07:00
Yuneng Jiang
d1e665742b
chore: drop stale python_version markers after floor raise
Now that requires-python starts at 3.11, the "python_version >= '3.9'"
and ">= '3.10'" markers are unconditionally true, and the "< '3.10'"
entries for psycopg, Pillow, pyarrow, langchain, lunary, and pylint can
never resolve. Drop the dead markers and remove the unreachable pins so
the dependency list reflects what actually gets installed.
2026-04-18 12:31:53 -07:00
Yuneng Jiang
1c29c5e903
chore: bump proxy deps and raise python floor to 3.11
Bumps orjson, fastapi-sso, and python-multipart to their latest releases
in the proxy extra, and raises the project python floor to 3.11 so the
updated pins can resolve. CI already runs on 3.11 / 3.12 / 3.13 and the
Docker images ship python 3.13, so the floor change aligns the declared
support range with what is actually tested and shipped.
2026-04-18 12:16:35 -07:00
Ishaan Jaffer
375cfb7f95
chore: update uv.lock after merging main 2026-04-17 12:56:23 -07:00
Yuneng Jiang
c294bbe4f0
fix(deps): pin langgraph-prebuilt==1.0.8 to avoid broken 1.0.9
langgraph-prebuilt 1.0.9 imports ExecutionInfo and ServerInfo from
langgraph.runtime, but those symbols are not exported until
langgraph 1.1.0. Our pin of langgraph==1.0.10 allows
langgraph-prebuilt<1.1.0,>=1.0.8, and uv resolves to 1.0.9 (the
latest in range), which breaks at import time in every test that
touches langgraph.prebuilt (e.g. tests/pass_through_tests/test_mcp_routes.py):

  ImportError: cannot import name 'ExecutionInfo' from 'langgraph.runtime'

Pinning langgraph-prebuilt to 1.0.8 pairs correctly with
langgraph==1.0.10 and restores the import path.
2026-04-16 09:36:05 -07:00
Yuneng Jiang
dafa1bf97c
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_yj_apr15
# Conflicts:
#	litellm/litellm_core_utils/litellm_logging.py
#	uv.lock
2026-04-16 09:17:20 -07:00
Brendan Smith-Elion
265a960472
fix(noma-v2): fall back to key_alias for application_id in Noma dashboard (#25795)
Noma v1 resolved application_id from user_api_key_alias when no explicit
value was set (PR #16832). Noma v2 (PR #21400) was rewritten from scratch
and this fallback was not ported, causing all requests from shared LiteLLM
instances to appear as a single generic "litellm" application in the Noma
dashboard — breaking per-user traceability.

Fix: after checking dynamic_params and self.application_id, fall back to
user_api_key_alias from litellm_metadata or metadata. This matches the
pattern used by PromptSecurityGuardrail._resolve_key_alias_from_request_data()
and restores the v1 behavior where each API key gets its own application
entry in the Noma dashboard.

Fixes #25794

Co-authored-by: Brendan Smith-Elion <brendan.smith-elion@arcadia.io>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 19:04:24 +05:30
Ishaan Jaffer
9114b0da96
fix(ci): sync uv.lock with pyproject.toml 2026-04-15 18:16:22 -07:00
jayden
0a1b4427a6
fix(guardrails): replace custom_code sandbox with RestrictedPython 2026-04-15 15:13:52 -07:00
Yuneng Jiang
83c459225c
[Fix] CI: fix GHA timeouts and uv lock --check failures
1. exclude-newer: change from absolute "2026-04-10" to relative "3 days".
   All pinned deps were published before the 3-day cutoff. Re-locked so
   uv lock --check passes in test-mcp.yml and test-linting.yml.

2. test_eager_tiktoken_load: run all 10 env var values in a single
   subprocess instead of spawning 10 separate processes. Each cold
   import litellm takes ~78s on CI, so the old loop took ~13 min on a
   single xdist worker. Now takes ~78s total.

3. proxy-db remaining timeout: increase from 20 to 30 minutes. The
   remaining group has 51 test files and was consistently timing out at
   71% across all branches (pre-existing issue, not migration-related).
2026-04-11 09:04:49 -07:00
Yuneng Jiang
d9a460277a
[Fix] CI: fix uv lock resolution and tiktoken test timeout
1. Cap requires-python to <3.14 — no deps ship 3.14 wheels yet, and
   uv's cross-version resolver fails on the Python 3.14 split.
2. Change exclude-newer from relative "30 days" to absolute "2026-04-10"
   so the lockfile stays reproducible. The relative date caused
   cryptography==46.0.7 (published April 8) to fall outside the window.
3. Parametrize test_eager_loading_env_var_values instead of looping —
   with xdist the 6 subprocess cases can run in parallel instead of all
   running sequentially on one worker (~13 min → ~2 min).
   Also removed redundant case variants (Yes/YES/On/ON) that test the
   same str_to_bool code path.
2026-04-10 22:21:15 -07:00
user
8d1493ed08
fix(security): bump vulnerable dependencies
pip:
- cryptography 43.0.3 → 46.0.7 (5 CVEs including CVSS 8.2 ECDH key leak)

npm:
- hono 4.1.4/4.12.7 → 4.12.12 (prototype pollution, cookie injection,
  path traversal, middleware bypass, IP matching bypass)
- @hono/node-server 1.19.6 → 1.19.13 (serveStatic middleware bypass)
- vite 7.3.1 → 7.3.2 (file read via WebSocket, path traversal, fs.deny bypass)
- lodash override 4.17.23 → 4.18.1 (code injection via _.template,
  prototype pollution via _.unset/_.omit)

mlflow left at 3.9.0 — 2 of 3 alerts have no upstream fix, and
3.11.1 is blocked by exclude-newer (transitive dep chain).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 19:35:19 +00:00
stuxf
a6c30b30bf
build: migrate packaging, CI, and Docker from Poetry to uv (#25007)
* build: migrate packaging metadata to uv

* ci: move automation and local tooling to uv

* docker: migrate image builds and runtime setup to uv

* docs: update install and deployment guidance for uv

* chore: align auxiliary scripts and tests with uv

* test: harden test_litellm isolation

* fix: keep release and health check images self-contained

* build: pin uv tooling and health check deps

* test: isolate bedrock image request formatting from suite state

* test: cover sandbox executor requirements flow

* ci: fix circleci no-op command steps

* ci: fix circleci publish workflow parsing

* fix: stabilize remaining uv migration CI checks

* ci: increase matrix test timeout headroom

* fix: restore published docker and license coverage

* fix: restore proxy runtime build parity

* fix: restore proxy extras parity and venv migrations

* ci: persist uv path across circleci steps

* fix: keep psycopg binary in default test env

* docker: preserve prisma cache across stages

* test: run local proxy checks through uv python

* build: restore runtime deps moved into ci

* build: refresh uv lock after upstream merge

* fix: restore module import in test_check_migration after merge

The conflict resolution imported only the function but the test body
references check_migration as a module throughout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching

- Move google-generativeai, Pillow, tenacity back to ci group (they are
  lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
  in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
  from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
  environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
  deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate uv.lock after removing nodejs-wheel-binaries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): use cache/restore instead of cache to prevent cache poisoning

The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert

The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv cache in publish workflow

Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove duplicate verbose_logger mock in test_check_migration

The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): free disk space before Docker build in test-server-root-path

The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:46:23 -07:00
Ryan Malloy
f76938af5e
fix(ollama): set finish_reason to tool_calls and remove broken capability check (#18924)
* Update CLAUDE.md with qwen3 tool_calls bug fix instructions (#18922)

* fix(ollama): set finish_reason to "tool_calls" when tool_calls present

When qwen3 models return tool_calls through Ollama, the finish_reason
was incorrectly left as "stop" instead of being set to "tool_calls".
This caused clients to miss the tool_calls in the response.

Added _get_finish_reason helper method following OpenAI provider's
pattern, and fixed both streaming and non-streaming response paths.

Fixes: https://github.com/BerriAI/litellm/issues/18922

* fix(ollama): pass tools directly without model capability check

The previous code tried to check model capability via get_model_info()
which made network calls to localhost:11434. When Ollama is remote,
this fails and falls back to JSON format, breaking tool calling.

Ollama 0.4+ supports native tool calling - let Ollama handle
model capability detection instead of LiteLLM.

Fixes #18922

* fix(ollama): transform tool_calls response to OpenAI format

Ollama returns tool_calls with arguments as dict, but OpenAI format
requires arguments to be a JSON string. Also ensures 'type': 'function'
field is present.

Completes the fix for #18922

* fix(ollama): set finish_reason to "tool_calls" when tool_calls present

Fixes #18922

Two issues addressed:

1. Remove broken model capability check
   - get_model_info() fails when Ollama runs on remote server
   - Broken fallback triggered JSON prompt injection
   - Now passes tools directly - Ollama 0.4+ handles detection

2. Set finish_reason correctly
   - Was hardcoded to "stop" even with tool_calls present
   - Clients use this to know how to process the response
   - Now returns "tool_calls" when tool_calls are in response

Both streaming and non-streaming responses are fixed.

Tests:
- All 14 existing Ollama tests pass
- Added 3 focused tests for the fixes
2026-01-14 03:52:26 +05:30