* build: migrate packaging metadata to uv
* ci: move automation and local tooling to uv
* docker: migrate image builds and runtime setup to uv
* docs: update install and deployment guidance for uv
* chore: align auxiliary scripts and tests with uv
* test: harden test_litellm isolation
* fix: keep release and health check images self-contained
* build: pin uv tooling and health check deps
* test: isolate bedrock image request formatting from suite state
* test: cover sandbox executor requirements flow
* ci: fix circleci no-op command steps
* ci: fix circleci publish workflow parsing
* fix: stabilize remaining uv migration CI checks
* ci: increase matrix test timeout headroom
* fix: restore published docker and license coverage
* fix: restore proxy runtime build parity
* fix: restore proxy extras parity and venv migrations
* ci: persist uv path across circleci steps
* fix: keep psycopg binary in default test env
* docker: preserve prisma cache across stages
* test: run local proxy checks through uv python
* build: restore runtime deps moved into ci
* build: refresh uv lock after upstream merge
* fix: restore module import in test_check_migration after merge
The conflict resolution imported only the function but the test body
references check_migration as a module throughout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching
- Move google-generativeai, Pillow, tenacity back to ci group (they are
lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate uv.lock after removing nodejs-wheel-binaries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): use cache/restore instead of cache to prevent cache poisoning
The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert
The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): disable setup-uv cache in publish workflow
Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(test): remove duplicate verbose_logger mock in test_check_migration
The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): free disk space before Docker build in test-server-root-path
The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Baseten Model API pricing entries for Nemotron, GLM, Kimi, GPT OSS, and DeepSeek models with validated model slugs. Include a focused regression test to assert provider and per-token pricing values.
Made-with: Cursor
* fix(vertex_ai): support pluggable (executable) credential_source for WIF auth (#24700)
The WIF credential dispatch in load_auth() only handled identity_pool and
aws credential types. When credential_source.executable was present (used
for Azure Managed Identity via Workload Identity Federation), it fell
through to identity_pool.Credentials which rejected it with MalformedError.
Add dispatch to google.auth.pluggable.Credentials for executable-type
credential sources, following the same pattern as the existing identity_pool
and aws helpers.
Fixes authentication for Azure Container Apps → GCP Vertex AI via WIF
with executable credential sources.
* feat(logging): add component and logger fields to JSON logs for 3rd p… (#24447)
* feat(logging): add component and logger fields to JSON logs for 3rd party filtering
* Let user-supplied extra fields win over auto-generated component/logger, tighten test assertions
* Feat - Add organization into the metrics metadata for org_id & org_alias (#24440)
* Add org_id and org_alias label names to Prometheus metric definitions
* Add user_api_key_org_alias to StandardLoggingUserAPIKeyMetadata
* Populate user_api_key_org_alias in pre-call metadata
* Pass org_id and org_alias into per-request Prometheus metric labels
* Add test for org labels on per-request Prometheus metrics
* chore: resolve test mockdata
* Address review: populate org_alias from DB view, add feature flag, use .get() for org metadata
* Add org labels to failure path and verify flag behavior in test
* Fix test: build flag-off enum_values without org fields
* Gate org labels behind feature flag in get_labels() instead of static metric lists
* Scope org label injection to metrics that carry team context, remove orphaned budget label defs, add test teardown
* Use explicit metric allowlist for org label injection instead of team heuristic
* Fix duplicate org label guard, move _org_label_metrics to class constant
* Reset custom_prometheus_metadata_labels after duplicate label assertion
* fix: emit org labels by default, remove flag, fix missing org_alias in all metadata paths
* fix: emit org labels by default, no opt-in flag required
* fix: write org_alias to metadata unconditionally in proxy_server.py
* fix: 429s from batch creation being converted to 500 (#24703)
* add us gov models (#24660)
* add us gov models
* added max tokens
* Litellm dev 04 02 2026 p1 (#25052)
* fix: replace hardcoded url
* fix: Anthropic web search cost not tracked for Chat Completions
The ModelResponse branch in response_object_includes_web_search_call()
only checked url_citation annotations and prompt_tokens_details, missing
Anthropic's server_tool_use.web_search_requests field. This caused
_handle_web_search_cost() to never fire for Anthropic Claude models.
Also routes vertex_ai/claude-* models to the Anthropic cost calculator
instead of the Gemini one, since Claude on Vertex uses the same
server_tool_use billing structure as the direct Anthropic API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(anthropic): pass logging_obj to client.post for litellm_overhead_time_ms (#24071)
When LITELLM_DETAILED_TIMING=true, litellm_overhead_time_ms was null for
Anthropic because the handler did not pass logging_obj to client.post(),
so track_llm_api_timing could not set llm_api_duration_ms. Pass
logging_obj=logging_obj at all four post() call sites (make_call,
make_sync_call, acompletion, completion). Add test to ensure make_call
passes logging_obj to client.post.
Made-with: Cursor
* sap - add additional parameters for grounding
- additional parameter for grounding added for the sap provider
* sap - fix models
* (sap) add filtering, masking, translation SAP GEN AI Hub modules
* (sap) add tests and docs for new SAP modules
* (sap) add support of multiple modules config
* (sap) code refactoring
* (sap) rename file
* test(): add safeguard tests
* (sap) update tests
* (sap) update docs, solve merge conflict in transformation.py
* (sap) linter fix
* (sap) Align embedding request transformation with current API
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) mock commit
* (sap) run black formater
* (sap) add literals to models, add negative tests, fix test for tool transformation
* (sap) fix formating
* (sap) fix models
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) commit for rerun bot review
* (sap) minor improve
* (sap) fix after bot review
* (sap) lint fix
* docs(sap): update documentation
* fix(sap): change creds priority
* fix(sap): change creds priority
* fix(sap): fix sap creds unit test
* fix(sap): linter fix
* fix(sap): linter fix
* linter fix
* (sap) update logic of fetching creds, add additional tests
* (sap) clean up code
* (sap) fix after review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) add a possibility to put the service key by both variants
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) update test
* (sap) update service key resolve function
* (sap) run black formater
* (sap) fix validate credentials, add negative tests for credential fetching
* (sap) fix validate credentials, add negative tests for credential fetching
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) lint fix
* (sap) lint fix
* feat: support service_tier in gemini
* chore: add a service_tier field mapping from openai to gemini
* fix: use x-gemini-service-tier header in response
* docs: add service_tier to gemini docs
* chore: add defaut/standard mapping, and some tests
* chore: tidying up some case insensitivity
* chore: remove unnecessary guard
* fix: remove redundant test file
* fix: handle 'auto' case-insensitively
* fix: return service_tier on final steamed chunk
* chore: black
* feat: enable supports_service_tier to gemini models
* Fix get_standard_logging_metadata tests
* Fix test_get_model_info_bedrock_models
* Fix test_get_model_info_bedrock_models
* Fix remaining tests
* Fix mypy issues
* Fix tests
* Fix merge conflicts
* Fix code qa
* Fix code qa
* Fix code qa
* Fix greptile review
---------
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Josh <36064836+J-Byron@users.noreply.github.com>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Alperen Kömürcü <alperen.koemuercue@sap.com>
Co-authored-by: Vasilisa Parshikova <vasilisa.parshikova@sap.com>
Co-authored-by: Lin Xu <lin.xu03@sap.com>
Co-authored-by: Mark McDonald <macd@google.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Fixes 'LLM Provider NOT provided' errors when models are configured with
custom_llm_provider but model names lack provider prefix (e.g., 'gpt-4.1-mini'
instead of 'azure/gpt-4.1-mini').
Changes:
- Router now passes deployment's custom_llm_provider to get_llm_provider()
- Fixes 6 code paths: file creation, file content, batch operations, vector store
- Adds regression tests for file creation and file content operations
Made-with: Cursor
* feat(triton): add embedding usage estimation for self-hosted responses
Populate Triton embedding usage from request input using token counting with a safe fallback so cost/observability flows work even when provider usage is missing.
Made-with: Cursor
* fix(triton): sum per-input embedding token counts for batches
Joining batch strings with newlines before token_counter added spurious
tokens. Count each input separately and sum, matching OpenAI-style usage.
Made-with: Cursor
Using setdefault('litellm_metadata', {}) unconditionally created an empty
litellm_metadata key for chat completions and embeddings. This caused
_get_metadata_variable_name_from_kwargs to return 'litellm_metadata' instead
of 'metadata', so tag-based routing looked for tags in the wrong dict and
ignored all tag filters.
Fix: only set the encrypted_content_affinity_enabled flag when litellm_metadata
already exists (Responses API path). Chat completions and embeddings never have
this key, so nothing is created and tag routing works correctly.
- Add constants.ts with all required exports (key aliases, team IDs)
- Add fixtures/users.ts with all role definitions and storage paths
- Add fixtures/seed.sql for deterministic test database seeding
- Remove Firefox project from playwright config (only Chromium installed)
- Remove unused variable in teams.spec.ts
- Rename CircleCI job to e2e_ui_testing
Add Playwright E2E tests covering proxy admin team and key management
workflows, with a self-contained test runner and CircleCI integration.
Tests cover: create team, invite user, edit/delete team members, create
key in team, regenerate key, update TPM/RPM limits, delete key, and
verify internal user keys are visible.
Infrastructure: run_e2e.sh builds the UI from source before starting
the proxy, ensuring tests always run against the latest UI changes.
Added data-testid attributes to key UI components for reliable selectors.
Redis caching unit tests (test_dual_cache, test_redis_batch_optimizations,
test_router_utils) required Redis secrets that should live in CircleCI.
- Add redis_caching_unit_tests job to CircleCI config
- Delete test-unit-caching-redis.yml GHA workflow
- Remove all Redis plumbing (inputs, secrets, env vars) from
_test-unit-services-base.yml and its callers
- Move os and MCP_STDIO_ALLOWED_COMMANDS imports to module level in mcp_server_manager.py
- Move MCP_STDIO_ALLOWED_COMMANDS import to module level in _types.py
- Change defense-in-depth warning to HTTPException 403 for legacy non-allowlisted commands
- Ensures arbitrary command execution is blocked for both new and legacy MCP servers
Addresses Greptile review comments:
- P2: Inline imports violate CLAUDE.md style guide
- P1 security: Defense-in-depth should block, not warn, for legacy commands
Made-with: Cursor
- Defense-in-depth: warn instead of hard-fail for legacy servers
- Move os import to module level in _types.py
- Document args residual risk in allowlist comment
- Add UpdateMCPServerRequest allowlist test
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add command allowlist for MCP stdio transport to prevent RCE via
/mcp-rest/test/* endpoints. Restrict test endpoints to PROXY_ADMIN
role. Fix docker/README.md MASTER_KEY -> LITELLM_MASTER_KEY.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The /v2/key/info endpoint was missing response filtering that
the v1 /key/info endpoint already had. This aligns the two
endpoints so v2 applies the same per-key permission checks and
strips internal fields from the response. Also fixes the
key_aliases query path to resolve aliases before querying.
Allow JWT tokens matching routing_overrides to use OAuth2 introspection without enabling global OAuth2 while keeping OAuth2 routing limited to LLM/info routes. Add regression coverage for management-route boundary and tighten opaque-token assertions; update docs to reflect selective-mode route scope.
Made-with: Cursor
The .npmrc file (ignore-scripts=true, min-release-age=3d) is temporarily
removed during the Docker build since lifecycle scripts are needed by
npm ci. However, the unconditional `mv` fails when the build context
doesn't include .npmrc (e.g. when LiteLLM is vendored in a subdirectory).
Make all .npmrc mv operations conditional. This is safe because npm ci
already installs from package-lock.json with pinned versions and
integrity hashes.
PR #25258 changed _cleanup_stale_managed_objects from update_many to
execute_raw via _expire_stale_rows, but the tests were not updated.
The tests now mock _expire_stale_rows on the instance and assert
update_many calls only for job completion, not stale cleanup.
- team-admin: assert Admin Settings is not visible (role-specific check)
- proxy-admin: use users[Role.ProxyAdmin].password from constants instead of duplicating the env var fallback inline
Pin all cosign public key references to the immutable commit hash
(0112e53) that first introduced the key, instead of fetching it from
the release tag. This addresses the concern that an attacker with push
access could replace the key on main/tags and re-sign tampered images.
Docs now show two verification methods: commit hash (recommended) and
release tag (convenience), with explanation of why the hash is stronger.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: batch-limit stale managed object cleanup to prevent 300K row UPDATE (#25257)
* Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant
Configurable batch limit (default 1000) for stale managed object cleanup,
preventing unbounded UPDATE queries from hitting 300K+ rows at once.
* Batch-limit stale managed object cleanup with single bounded SQL query
Two fixes to _cleanup_stale_managed_objects:
1. Replace unbounded update_many with a single execute_raw using a
subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE
rows. Zero rows loaded into Python memory — everything stays in Postgres.
Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py
(the proxy requires PostgreSQL per schema.prisma).
2. Extract _expire_stale_rows as a separate method for testability.
Keeps the file_purpose='response' filter to avoid incorrectly expiring
long-running batch or fine-tune jobs that legitimately exceed the
staleness cutoff.
* docs: add STALE_OBJECT_CLEANUP_BATCH_SIZE to env vars reference
* test: remove deprecated embed-english-v2.0 cohere embedding tests
Adds a new endpoint to bulk-update team_member_permissions across
teams. Supports apply_to_all_teams (with cursor-based pagination)
or a specific list of team_ids. Merges new permissions into each
team's existing set rather than overwriting.
Also fixes test isolation bug in test_get_prompt_info_by_base_id
where leaked prisma_client state from other tests caused a
TypeError on await.
* Remove redundant matrix unit test workflow
All test paths in test-litellm-matrix.yml are fully covered by the
newer semantic unit test workflows (test-unit-*.yml), making the
matrix workflow redundant CI spend.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add Codecov coverage reporting to semantic unit test workflows
Add coverage collection (--cov) and Codecov OIDC upload to both
reusable base workflows and all 12 caller workflows, replacing the
coverage reporting that was previously only in the matrix workflow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Move id-token/pull-requests permissions to job level for multi-job workflows
For workflows with multiple jobs (llm-providers, proxy-db), move
id-token: write and pull-requests: write from workflow level to job
level so permissions are scoped to only the jobs that need them.
Removes zizmor inline suppressions that were masking the issue.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* [Docs] Enforce Black formatting in contributor docs
Black formatting is now enforced in CI. Update CLAUDE.md, AGENTS.md,
and CONTRIBUTING.md to instruct contributors and AI agents to run
`poetry run black .` before committing, and add VS Code setup guidance.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: fixes
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>