Commit Graph

39037 Commits

Author SHA1 Message Date
yuneng-jiang
f2969ca78a
Merge pull request #27165 from BerriAI/litellm_/friendly-lichterman-35cf02
[Fix] CI: Enable VCR replay for test_azure_o_series
2026-05-04 20:59:46 -07:00
Yuneng Jiang
0976fbc6c4
[Fix] Tests: Restore /metrics access for prometheus test suite
/metrics now requires auth by default; tests/otel_tests/test_prometheus.py
makes 4+ unauthenticated GETs against http://0.0.0.0:4000/metrics, so
every prometheus test in CI now fails the metric assertion.

Set require_auth_for_metrics_endpoint: false in otel_test_config.yaml
to opt out for this test job, which scrapes /metrics directly. Verified
locally: 8/8 prometheus tests green (one flaky retry on
test_proxy_success_metrics that pre-dates this PR).

Also drop the -x stop-on-first-failure flag from the otel test command
so all failures in the job surface in a single CI run rather than
hiding behind whichever one trips first.
2026-05-04 20:54:54 -07:00
Yuneng Jiang
6a6c79d992
[Fix] CI: Enable VCR replay for test_azure_o_series
The Azure o-series tests were excluded from the conftest's VCR auto-marker
because of a respx/vcrpy transport-patching conflict, but the only respx
reference in the file was an unused `MockRouter` import. Drop the dead
import and remove the file from the conflict set so cassettes record on
first run and replay thereafter, eliminating the 60-95s live Azure latency
that was crashing xdist workers under --timeout=120 thread-mode timeouts.
2026-05-04 20:48:26 -07:00
Sameer Kankute
b0edffb883
Merge pull request #27103 from BerriAI/litellm_azure-deployment-image-body
fix(azure): omit model from deployment image gen and image edit bodies
2026-05-05 09:09:45 +05:30
Yuneng Jiang
e6f524f951
[Fix] Tests: Pick chat-completion OTEL trace by content, not recency
The /otel-spans endpoint returns process-wide spans and tags
most_recent_parent by max start_time. After tightening that route to
proxy_admin (sk-1234), the GET /otel-spans request itself emits auth
spans that beat the chat-completion spans on start_time, so
most_recent_parent now points at the request's own auth trace
(['postgres', 'postgres']) and the >=5-span assertion fails.

Pick the chat-completion trace by content: it is the only trace whose
span list is a superset of {postgres, redis, raw_gen_ai_request,
batch_write_to_db}. Verified locally end-to-end against
otel_test_config.yaml + OTEL_EXPORTER=in_memory: 3/3 runs green.
2026-05-04 20:35:09 -07:00
Sameer Kankute
4487d8352f
Merge pull request #27115 from Sameerlite/litellm_health_check_reasoning_effort
feat(proxy): add health_check_reasoning_effort for model health checks
2026-05-05 09:00:09 +05:30
Yuneng Jiang
8a1b6635fa
[Fix] Tests: Use master key for /otel-spans in test_chat_completion_check_otel_spans
/otel-spans now requires proxy admin (returns 401 'Only proxy admin
can be used to generate, delete, update info for new keys/users/teams.
Route=/otel-spans' for non-admin callers). Switch the GET call to use
the master key sk-1234 while keeping the generated key for the
chat-completion request that produces the spans.
2026-05-04 20:23:11 -07:00
Sameer Kankute
b4ee6a2355
test(proxy): cover health_check_reasoning_effort for completion mode
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-05 08:52:57 +05:30
Sameer Kankute
bb0e4168ad
refactor(azure): move image gen JSON helper; rename image edit finalize hook
- Add image_generation/http_utils.azure_deployment_image_generation_json_body; call
  from azure.py (keeps AzureChatCompletion focused on chat).
- Rename finalize_image_edit_multipart_data to finalize_image_edit_request_data with
  docstring covering multipart and JSON POST payloads (review feedback).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-05 08:49:46 +05:30
Yuneng Jiang
193907a4a3
[Fix] Lint: Mark _user_has_admin_view re-export in common_utils
Ruff F401 flagged the aliased import as unused within common_utils.py
because the name is consumed only by external modules (~15 callers
across guardrails, spend tracking, MCP, agents, management endpoints).
Add `# noqa: F401  re-exported` so the alias survives lint while
keeping a single source of truth in litellm.proxy._types.
2026-05-04 20:16:59 -07:00
Yuneng Jiang
8cac6c5bff
[Fix] Proxy: Address Greptile feedback on hook-cycle PR
- Move _user_has_admin_view to litellm.proxy._types as
  user_api_key_has_admin_view (single source of truth). common_utils.py
  and isolation.py both import from there now, removing the duplicated
  role-check that could silently diverge if new admin roles are added.
- Add pytest.importorskip("litellm_enterprise") to the two regression
  tests that assert managed_files / managed_vector_stores are registered;
  those keys come from ENTERPRISE_PROXY_HOOKS so the tests would fail
  unconditionally in a checkout without the enterprise extra installed.
2026-05-04 20:13:31 -07:00
Yuneng Jiang
727ab8dcc4
[Fix] Proxy: Break managed-resources import cycle on Python 3.13
The Python 3.13 CCI smoke matrix surfaces a partially-initialized-module
ImportError when loading the managed files hook chain:

  litellm.proxy.hooks/__init__ (mid-import)
    -> enterprise.enterprise_hooks
    -> litellm_enterprise.proxy.hooks.managed_files
    -> litellm.llms.base_llm.managed_resources.isolation
    -> litellm.proxy.management_endpoints.common_utils
    -> litellm.proxy.utils  (re-enters litellm.proxy.hooks)

The except ImportError block in hooks/__init__.py silently swallowed the
failure, leaving managed_files unregistered and POST /files returning
500 "Managed files hook not found".

Two-layer fix:
- Inline the 3-line _user_has_admin_view check in isolation.py instead
  of importing it from litellm.proxy.management_endpoints.common_utils.
  litellm.llms.* should not depend on litellm.proxy.* — removing this
  layering violation breaks the cycle at its root.
- Define PROXY_HOOKS and get_proxy_hook before the conditional
  enterprise import in litellm/proxy/hooks/__init__.py, so any future
  re-entry resolves the public names instead of hitting an
  ImportError on a partially-initialized module.

Also fold in two unrelated CCI repairs surfaced in the same staging run:
- tests/otel_tests/test_key_logging_callbacks.py: per-key
  gcs_bucket_name / gcs_path_service_account are now stripped by
  initialize_dynamic_callback_params, so the GCS client falls through
  to the env-only branch. Update the assertion to match the new
  "GCS_BUCKET_NAME is not set" message.
- .circleci/config.yml: tests/pass_through_tests now resolves
  google-auth-library@10.x via the @google-cloud/vertexai 1.12.0 bump,
  which uses dynamic ESM imports Jest 29 cannot load without
  --experimental-vm-modules. Pass that flag in the Vertex JS test step.

Adds tests/test_litellm/proxy/hooks/test_proxy_hooks_init.py as a
regression guard: managed_files / managed_vector_stores must register,
and isolation.py must not transitively import litellm.proxy.utils.
2026-05-04 20:05:24 -07:00
Yuneng Jiang
7c8409d013
chore: update Next.js build artifacts (2026-05-05 02:13 UTC, node v20.20.2) 2026-05-04 19:13:25 -07:00
yuneng-jiang
9ea824d5bf
Merge pull request #27143 from BerriAI/cursor/fix-secret-fields-in-spend-logs-a532
fix(security): prevent secret_fields from leaking into spend logs
2026-05-04 19:07:54 -07:00
yuneng-jiang
be5f217aaf
Merge pull request #26861 from BerriAI/litellm_fix_scim_virtual_key_deactivation
fix(scim): revoke virtual keys when SCIM deprovisions a user
2026-05-04 19:03:55 -07:00
Cursor Agent
5923c3209b
fix(security): prevent secret_fields from leaking into spend logs
secret_fields (containing raw HTTP headers including Authorization
Bearer tokens) was being included in proxy_server_request['body']
because the body snapshot was a copy.copy(data) of the full request
dict. This body gets serialized and persisted in the LiteLLM_SpendLogs
table, exposing user credentials in the database.

Root cause: data['secret_fields'] was set before the body snapshot at
data['proxy_server_request']['body'] = copy.copy(data), so the full
raw headers (including auth tokens) ended up in the snapshot.

Fix (defense in depth):
1. Exclude 'secret_fields' when creating the body snapshot in
   litellm_pre_call_utils.py (primary fix)
2. Strip 'secret_fields' in _sanitize_request_body_for_spend_logs_payload
   as a secondary safeguard

secret_fields remains available on the live data dict for legitimate
downstream consumers (MCP, Responses API).

Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
2026-05-05 02:01:41 +00:00
yuneng-jiang
555a8131fe
Merge pull request #26951 from stuxf/codex/skills-containers-tenant-guard
chore(proxy): tighten resource ownership checks
2026-05-04 18:47:17 -07:00
yuneng-jiang
2f305050ce
Merge pull request #27004 from stuxf/fix/managed-resource-service-account-isolation
fix(proxy): isolate managed resources for service-account API keys
2026-05-04 18:45:55 -07:00
user
3dcb6bd3f9
Merge remote-tracking branch 'upstream/litellm_internal_staging' into codex/skills-containers-tenant-guard
# Conflicts:
#	litellm/proxy/auth/auth_utils.py
2026-05-05 01:41:25 +00:00
user
7faba9656f
Merge remote-tracking branch 'upstream/litellm_internal_staging' into fix/managed-resource-service-account-isolation 2026-05-05 01:38:11 +00:00
yuneng-jiang
281296f9cf
Merge pull request #27151 from BerriAI/litellm_yj_may4
[Infra] Merge dev branch
2026-05-04 18:29:52 -07:00
user
aee064ad37
Merge remote-tracking branch 'upstream/litellm_internal_staging' into fix/managed-resource-service-account-isolation 2026-05-05 01:29:05 +00:00
yuneng-jiang
dcb357ee2d
Merge pull request #27149 from BerriAI/litellm_/peaceful-bell-ba8ca5
[Fix] Tests: Replace deprecated openrouter/claude-3.7-sonnet with claude-sonnet-4.5
2026-05-04 18:27:45 -07:00
yuneng-jiang
efca16ccfa
Merge pull request #27043 from stuxf/fix/ssti-prompt-managers
fix(security): sandbox jinja2 in gitlab/arize/bitbucket prompt managers
2026-05-04 18:23:41 -07:00
Yuneng Jiang
e35cd5af76
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_yj_may4 2026-05-04 18:22:47 -07:00
Yuneng Jiang
7f550a5d67
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/peaceful-bell-ba8ca5 2026-05-04 18:21:33 -07:00
Yassin Kortam
db2a3cafb6
Merge pull request #27131 from BerriAI/litellm_fix/routing-groups-ui
feat: routing groups ui
2026-05-04 18:16:49 -07:00
mateo-berri
4179159f0f Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_azure-deployment-image-body 2026-05-04 18:16:46 -07:00
Yassin Kortam
a56256e5ee feat: routing groups ui 2026-05-04 18:09:14 -07:00
yuneng-jiang
42cd9493e9
Merge pull request #27071 from stuxf/fix/strip-pricing-fields
chore(proxy): drop client-supplied pricing fields from request bodies
2026-05-04 18:08:41 -07:00
yuneng-jiang
68c120a68f
Merge pull request #26957 from stuxf/chore/guardrail-coverage
chore(guardrails): cover multimodal + Responses-API content shapes
2026-05-04 18:01:27 -07:00
Yuneng Jiang
00d0c3e745
Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_/peaceful-bell-ba8ca5 2026-05-04 17:51:54 -07:00
Yuneng Jiang
22782f3c3f
[Fix] Tests: Replace deprecated openrouter/claude-3.7-sonnet with claude-sonnet-4.5
OpenRouter has dropped active endpoints for anthropic/claude-3.7-sonnet,
causing test_reasoning_content_completion to fail with a 404 "No endpoints
found" error. Switch to anthropic/claude-sonnet-4.5, which is current and
supports reasoning streaming.
2026-05-04 17:51:50 -07:00
user
4699b3dc81
chore(container): use delete_cache, json-encode scope key, clean test
/simplify follow-ups:

* Replace the two-``pop`` reach into ``cache_dict``/``ttl_dict`` with
  the existing public ``InMemoryCache.delete_cache(key)`` — the same
  idiom used elsewhere in the proxy. Bonus: ``delete_cache`` calls
  ``_remove_key`` which also handles ``expiration_heap`` consistency
  the direct pops were silently leaking.

* JSON-encode the sorted scope list for the cache key instead of
  ``"|".join``. ``user_id`` / ``team_id`` / ``org_id`` / ``api_key``
  are free-form strings and could contain a literal ``|`` — JSON
  quoting escapes any in-string separator unambiguously.

* Extract ``_allowed_container_ids_cache_key()`` so the read and
  invalidation sites compute the key the same way.

* Fix a placeholder-then-overwrite test construction: the
  ``__module__.split(".")[0] and "proxy_admin"`` line evaluated to a
  literal string that was immediately overwritten with the real enum
  value. Hoist the import and construct directly.
2026-05-05 00:43:47 +00:00
user
2adfa96db2
fix(container): cache list-allow-set, track admin-created containers
Address Greptile P2 follow-ups from the prior round:

* Cache ``_get_allowed_container_ids`` (60s LRU/TTL keyed by sorted
  owner-scope tuple) so ``GET /v1/containers`` doesn't issue a fresh
  ``find_many`` against ``litellm_managedobjecttable`` on every list
  call. Invalidate the caller's own cache entry when they record a
  new owner so the just-created container shows up on their next list.

* Tighten the admin early-return in ``record_container_owner`` to skip
  ONLY when there's literally no container ID to stamp. An admin with
  identity (the master-key path populates ``user_id`` + ``api_key``)
  flows through the normal record path so admin-created containers are
  tracked like any other caller's. The truly-identity-less admin case
  still falls through to the 403 below — correct fail-secure default.

Skill-cache invalidation gap (also flagged by Greptile) is moot: there
is no skill update endpoint exposed; ownership-affecting mutations are
only delete (already invalidates) and create (new ID, no cache entry
to update).
2026-05-05 00:39:53 +00:00
user
6ce84effe1
chore: simplify ownership tracking — drop thin stores, in-memory fallback, hand-rolled cache
Substantial reduction (~765 LOC) without changing the security
boundary:

* Drop ContainerOwnershipStore and LiteLLMSkillsStore — both were
  one-method-per-Prisma-call wrappers. Inline the calls instead,
  matching the established pattern in vector_store_endpoints,
  agent_endpoints, and mcp_server/db.py.

* Drop the prisma_client is None in-memory fallback. Production
  deploys always have Prisma; running ownership-critical paths on a
  process-local dict is a security footgun in the dev-mode case it
  was meant to support, and complicates every code path with a
  branch. Fail-secure: skip recording if Prisma is unavailable, and
  treat reads as "not found" (admin-only).

* Drop the hand-rolled module-level cache. Replace with the existing
  litellm.caching.in_memory_cache.InMemoryCache, which already has
  TTL + max-size + eviction tested in its own module. Sentinel string
  for negative caching since InMemoryCache can't disambiguate "miss"
  from "cached as None".

* Tests: drop coverage for removed code paths (in-memory fallback,
  hand-rolled cache internals). Keep tests for actual behavior (cache
  hit-rate, negative caching, owner check, list filtering,
  identity-less reject, admin bypass).
2026-05-05 00:23:32 +00:00
user
83971a8712 fix(proxy): normalize managed resource team owner field 2026-05-04 17:05:50 -07:00
yuneng-jiang
de7175d6ab
Merge pull request #26912 from stuxf/codex/auth-sensitive-routes
chore(proxy): guard sensitive public endpoints
2026-05-04 17:04:10 -07:00
user
12fe945e7b
fix: keep skills handler FastAPI-free; fold gcs deny list into the body bouncer
Two cleanups:

* ``LiteLLMSkillsHandler.create_skill`` raised ``HTTPException`` for
  identity-less callers, importing FastAPI from a ``litellm/llms/``
  module — that violates the project rule that FastAPI lives only
  under ``proxy/``. Switch to ``ValueError`` (the same shape the rest
  of the handler uses for not-found/forbidden) and update the test.

* The proxy-auth body bouncer derived its observability ban list from
  ``_supported_callback_params`` only, missing
  ``_request_blocked_callback_params`` (where ``gcs_bucket_name`` and
  ``gcs_path_service_account`` live). Two recently-merged sibling PRs
  (#27019 added the deny list, #27081 added the test asserting these
  are rejected at the request body root) crossed without folding them
  together. Union the GCS deny list into the bouncer's derivation so
  the single source of truth covers both code paths.
2026-05-04 23:54:33 +00:00
user
abcf204d38 fix(proxy): include request-blocked callback params in auth bans 2026-05-04 16:54:04 -07:00
user
b5a14f22d6
Merge remote-tracking branch 'upstream/litellm_internal_staging' into codex/skills-containers-tenant-guard 2026-05-04 23:50:29 +00:00
user
6a3f6b47de Merge remote-tracking branch 'origin/litellm_internal_staging' into fix/strip-pricing-fields-pr27071
# Conflicts:
#	litellm/proxy/litellm_pre_call_utils.py
2026-05-04 16:45:21 -07:00
user
777862a018
Merge remote-tracking branch 'upstream/litellm_internal_staging' into codex/skills-containers-tenant-guard 2026-05-04 23:40:26 +00:00
user
758b488326
fix(ownership): reject identity-less callers instead of sharing a sentinel scope
UNSCOPED_RESOURCE_OWNER_SCOPE collapsed every caller without an
identity field (no user_id / team_id / org_id / api_key / token) into
a single shared owner — a cross-tenant access primitive: any two such
callers could see and delete each other's containers and skills.

Drop the sentinel. ``get_primary_resource_owner_scope`` returns
``None`` and ``get_resource_owner_scopes`` returns ``[]`` for
identity-less callers. ``record_container_owner`` and
``LiteLLMSkillsHandler.create_skill`` now reject creates from
identity-less callers with a 403 instead of stamping the placeholder.
Read paths already deny ``owner is None`` correctly so legacy rows
(if any) are admin-only.
2026-05-04 23:40:22 +00:00
user
de682c810e
chore(container,skills): drop legacy-access opt-out env vars
LITELLM_ALLOW_UNTRACKED_CONTAINER_ACCESS and
LITELLM_ALLOW_UNOWNED_SKILL_ACCESS were operator-toggleable opt-outs
for the cross-tenant access primitive this PR closes — flipping either
on re-enabled exactly the VERIA-20 read path. Default-secure with no
escape hatch matches sibling fixes (vector-store cred isolation, semantic
cache key isolation, user_config strip): all rejected the
opt-out-of-security pattern.

Untracked containers and unowned skills (rows that pre-date this
enforcement) are admin-only. Non-admin owners need to either re-create
via the now-tracked flow or have an admin assign ``created_by`` on the
existing row. Update tests to assert the strict-only behaviour.
2026-05-04 23:22:19 +00:00
yuneng-jiang
07824b5eec
Merge pull request #26990 from stuxf/codex/semantic-cache-tenant-isolation
chore(caching): isolate semantic cache entries
2026-05-04 16:02:43 -07:00
yuneng-jiang
0c0b5e005f
Merge pull request #27082 from stuxf/fix/vector-store-cred-leak
fix(vector_store): resolve embedding config at request time, never persist creds
2026-05-04 15:55:40 -07:00
user
ec9b84d38c
chore(container,skills): LRU eviction for owner caches; widen file_purpose Literal
Two cleanups from the /simplify pass:

* ``_CONTAINER_OWNER_CACHE`` and ``_SKILL_CACHE`` now LRU-evict via
  ``OrderedDict.popitem(last=False)`` instead of full ``clear()`` at
  capacity. Full clears converted a steady-state cached workload into a
  periodic full-DB-load oscillation as the cache repopulated from zero
  and cleared again. Reads now ``move_to_end`` so the just-touched
  entry survives the next eviction. Mirrors the pre-existing LRU
  pattern in ``_remember_container_owner``.

* ``LiteLLM_ManagedObjectTable.file_purpose`` Literal now includes
  ``"container"`` so Pydantic validation accepts rows written by the
  ownership store.
2026-05-04 22:52:54 +00:00
yuneng-jiang
e4ac46b5d1
Merge pull request #27081 from stuxf/fix/strip-callback-fields
chore(proxy): close callback-config and observability-credential side channels
2026-05-04 15:45:42 -07:00
user
4fa577810b
fix(container): keep ownership-filter exceptions out of the LLM-error path
filter_container_list_response runs after the upstream call has
already succeeded; treating an ownership-lookup failure as an LLM-API
error fires post_call_failure_hook for a successful upstream call and
returns a misleading provider-shaped error to the client. Run the
filter outside the try/except so genuine LLM errors stay scoped to
the upstream call.
2026-05-04 22:43:18 +00:00