/metrics now requires auth by default; tests/otel_tests/test_prometheus.py
makes 4+ unauthenticated GETs against http://0.0.0.0:4000/metrics, so
every prometheus test in CI now fails the metric assertion.
Set require_auth_for_metrics_endpoint: false in otel_test_config.yaml
to opt out for this test job, which scrapes /metrics directly. Verified
locally: 8/8 prometheus tests green (one flaky retry on
test_proxy_success_metrics that pre-dates this PR).
Also drop the -x stop-on-first-failure flag from the otel test command
so all failures in the job surface in a single CI run rather than
hiding behind whichever one trips first.
The Azure o-series tests were excluded from the conftest's VCR auto-marker
because of a respx/vcrpy transport-patching conflict, but the only respx
reference in the file was an unused `MockRouter` import. Drop the dead
import and remove the file from the conflict set so cassettes record on
first run and replay thereafter, eliminating the 60-95s live Azure latency
that was crashing xdist workers under --timeout=120 thread-mode timeouts.
The /otel-spans endpoint returns process-wide spans and tags
most_recent_parent by max start_time. After tightening that route to
proxy_admin (sk-1234), the GET /otel-spans request itself emits auth
spans that beat the chat-completion spans on start_time, so
most_recent_parent now points at the request's own auth trace
(['postgres', 'postgres']) and the >=5-span assertion fails.
Pick the chat-completion trace by content: it is the only trace whose
span list is a superset of {postgres, redis, raw_gen_ai_request,
batch_write_to_db}. Verified locally end-to-end against
otel_test_config.yaml + OTEL_EXPORTER=in_memory: 3/3 runs green.
/otel-spans now requires proxy admin (returns 401 'Only proxy admin
can be used to generate, delete, update info for new keys/users/teams.
Route=/otel-spans' for non-admin callers). Switch the GET call to use
the master key sk-1234 while keeping the generated key for the
chat-completion request that produces the spans.
- Add image_generation/http_utils.azure_deployment_image_generation_json_body; call
from azure.py (keeps AzureChatCompletion focused on chat).
- Rename finalize_image_edit_multipart_data to finalize_image_edit_request_data with
docstring covering multipart and JSON POST payloads (review feedback).
Co-authored-by: Cursor <cursoragent@cursor.com>
Ruff F401 flagged the aliased import as unused within common_utils.py
because the name is consumed only by external modules (~15 callers
across guardrails, spend tracking, MCP, agents, management endpoints).
Add `# noqa: F401 re-exported` so the alias survives lint while
keeping a single source of truth in litellm.proxy._types.
- Move _user_has_admin_view to litellm.proxy._types as
user_api_key_has_admin_view (single source of truth). common_utils.py
and isolation.py both import from there now, removing the duplicated
role-check that could silently diverge if new admin roles are added.
- Add pytest.importorskip("litellm_enterprise") to the two regression
tests that assert managed_files / managed_vector_stores are registered;
those keys come from ENTERPRISE_PROXY_HOOKS so the tests would fail
unconditionally in a checkout without the enterprise extra installed.
The Python 3.13 CCI smoke matrix surfaces a partially-initialized-module
ImportError when loading the managed files hook chain:
litellm.proxy.hooks/__init__ (mid-import)
-> enterprise.enterprise_hooks
-> litellm_enterprise.proxy.hooks.managed_files
-> litellm.llms.base_llm.managed_resources.isolation
-> litellm.proxy.management_endpoints.common_utils
-> litellm.proxy.utils (re-enters litellm.proxy.hooks)
The except ImportError block in hooks/__init__.py silently swallowed the
failure, leaving managed_files unregistered and POST /files returning
500 "Managed files hook not found".
Two-layer fix:
- Inline the 3-line _user_has_admin_view check in isolation.py instead
of importing it from litellm.proxy.management_endpoints.common_utils.
litellm.llms.* should not depend on litellm.proxy.* — removing this
layering violation breaks the cycle at its root.
- Define PROXY_HOOKS and get_proxy_hook before the conditional
enterprise import in litellm/proxy/hooks/__init__.py, so any future
re-entry resolves the public names instead of hitting an
ImportError on a partially-initialized module.
Also fold in two unrelated CCI repairs surfaced in the same staging run:
- tests/otel_tests/test_key_logging_callbacks.py: per-key
gcs_bucket_name / gcs_path_service_account are now stripped by
initialize_dynamic_callback_params, so the GCS client falls through
to the env-only branch. Update the assertion to match the new
"GCS_BUCKET_NAME is not set" message.
- .circleci/config.yml: tests/pass_through_tests now resolves
google-auth-library@10.x via the @google-cloud/vertexai 1.12.0 bump,
which uses dynamic ESM imports Jest 29 cannot load without
--experimental-vm-modules. Pass that flag in the Vertex JS test step.
Adds tests/test_litellm/proxy/hooks/test_proxy_hooks_init.py as a
regression guard: managed_files / managed_vector_stores must register,
and isolation.py must not transitively import litellm.proxy.utils.
secret_fields (containing raw HTTP headers including Authorization
Bearer tokens) was being included in proxy_server_request['body']
because the body snapshot was a copy.copy(data) of the full request
dict. This body gets serialized and persisted in the LiteLLM_SpendLogs
table, exposing user credentials in the database.
Root cause: data['secret_fields'] was set before the body snapshot at
data['proxy_server_request']['body'] = copy.copy(data), so the full
raw headers (including auth tokens) ended up in the snapshot.
Fix (defense in depth):
1. Exclude 'secret_fields' when creating the body snapshot in
litellm_pre_call_utils.py (primary fix)
2. Strip 'secret_fields' in _sanitize_request_body_for_spend_logs_payload
as a secondary safeguard
secret_fields remains available on the live data dict for legitimate
downstream consumers (MCP, Responses API).
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
OpenRouter has dropped active endpoints for anthropic/claude-3.7-sonnet,
causing test_reasoning_content_completion to fail with a 404 "No endpoints
found" error. Switch to anthropic/claude-sonnet-4.5, which is current and
supports reasoning streaming.
/simplify follow-ups:
* Replace the two-``pop`` reach into ``cache_dict``/``ttl_dict`` with
the existing public ``InMemoryCache.delete_cache(key)`` — the same
idiom used elsewhere in the proxy. Bonus: ``delete_cache`` calls
``_remove_key`` which also handles ``expiration_heap`` consistency
the direct pops were silently leaking.
* JSON-encode the sorted scope list for the cache key instead of
``"|".join``. ``user_id`` / ``team_id`` / ``org_id`` / ``api_key``
are free-form strings and could contain a literal ``|`` — JSON
quoting escapes any in-string separator unambiguously.
* Extract ``_allowed_container_ids_cache_key()`` so the read and
invalidation sites compute the key the same way.
* Fix a placeholder-then-overwrite test construction: the
``__module__.split(".")[0] and "proxy_admin"`` line evaluated to a
literal string that was immediately overwritten with the real enum
value. Hoist the import and construct directly.
Address Greptile P2 follow-ups from the prior round:
* Cache ``_get_allowed_container_ids`` (60s LRU/TTL keyed by sorted
owner-scope tuple) so ``GET /v1/containers`` doesn't issue a fresh
``find_many`` against ``litellm_managedobjecttable`` on every list
call. Invalidate the caller's own cache entry when they record a
new owner so the just-created container shows up on their next list.
* Tighten the admin early-return in ``record_container_owner`` to skip
ONLY when there's literally no container ID to stamp. An admin with
identity (the master-key path populates ``user_id`` + ``api_key``)
flows through the normal record path so admin-created containers are
tracked like any other caller's. The truly-identity-less admin case
still falls through to the 403 below — correct fail-secure default.
Skill-cache invalidation gap (also flagged by Greptile) is moot: there
is no skill update endpoint exposed; ownership-affecting mutations are
only delete (already invalidates) and create (new ID, no cache entry
to update).
Substantial reduction (~765 LOC) without changing the security
boundary:
* Drop ContainerOwnershipStore and LiteLLMSkillsStore — both were
one-method-per-Prisma-call wrappers. Inline the calls instead,
matching the established pattern in vector_store_endpoints,
agent_endpoints, and mcp_server/db.py.
* Drop the prisma_client is None in-memory fallback. Production
deploys always have Prisma; running ownership-critical paths on a
process-local dict is a security footgun in the dev-mode case it
was meant to support, and complicates every code path with a
branch. Fail-secure: skip recording if Prisma is unavailable, and
treat reads as "not found" (admin-only).
* Drop the hand-rolled module-level cache. Replace with the existing
litellm.caching.in_memory_cache.InMemoryCache, which already has
TTL + max-size + eviction tested in its own module. Sentinel string
for negative caching since InMemoryCache can't disambiguate "miss"
from "cached as None".
* Tests: drop coverage for removed code paths (in-memory fallback,
hand-rolled cache internals). Keep tests for actual behavior (cache
hit-rate, negative caching, owner check, list filtering,
identity-less reject, admin bypass).
Two cleanups:
* ``LiteLLMSkillsHandler.create_skill`` raised ``HTTPException`` for
identity-less callers, importing FastAPI from a ``litellm/llms/``
module — that violates the project rule that FastAPI lives only
under ``proxy/``. Switch to ``ValueError`` (the same shape the rest
of the handler uses for not-found/forbidden) and update the test.
* The proxy-auth body bouncer derived its observability ban list from
``_supported_callback_params`` only, missing
``_request_blocked_callback_params`` (where ``gcs_bucket_name`` and
``gcs_path_service_account`` live). Two recently-merged sibling PRs
(#27019 added the deny list, #27081 added the test asserting these
are rejected at the request body root) crossed without folding them
together. Union the GCS deny list into the bouncer's derivation so
the single source of truth covers both code paths.
UNSCOPED_RESOURCE_OWNER_SCOPE collapsed every caller without an
identity field (no user_id / team_id / org_id / api_key / token) into
a single shared owner — a cross-tenant access primitive: any two such
callers could see and delete each other's containers and skills.
Drop the sentinel. ``get_primary_resource_owner_scope`` returns
``None`` and ``get_resource_owner_scopes`` returns ``[]`` for
identity-less callers. ``record_container_owner`` and
``LiteLLMSkillsHandler.create_skill`` now reject creates from
identity-less callers with a 403 instead of stamping the placeholder.
Read paths already deny ``owner is None`` correctly so legacy rows
(if any) are admin-only.
LITELLM_ALLOW_UNTRACKED_CONTAINER_ACCESS and
LITELLM_ALLOW_UNOWNED_SKILL_ACCESS were operator-toggleable opt-outs
for the cross-tenant access primitive this PR closes — flipping either
on re-enabled exactly the VERIA-20 read path. Default-secure with no
escape hatch matches sibling fixes (vector-store cred isolation, semantic
cache key isolation, user_config strip): all rejected the
opt-out-of-security pattern.
Untracked containers and unowned skills (rows that pre-date this
enforcement) are admin-only. Non-admin owners need to either re-create
via the now-tracked flow or have an admin assign ``created_by`` on the
existing row. Update tests to assert the strict-only behaviour.
Two cleanups from the /simplify pass:
* ``_CONTAINER_OWNER_CACHE`` and ``_SKILL_CACHE`` now LRU-evict via
``OrderedDict.popitem(last=False)`` instead of full ``clear()`` at
capacity. Full clears converted a steady-state cached workload into a
periodic full-DB-load oscillation as the cache repopulated from zero
and cleared again. Reads now ``move_to_end`` so the just-touched
entry survives the next eviction. Mirrors the pre-existing LRU
pattern in ``_remember_container_owner``.
* ``LiteLLM_ManagedObjectTable.file_purpose`` Literal now includes
``"container"`` so Pydantic validation accepts rows written by the
ownership store.
filter_container_list_response runs after the upstream call has
already succeeded; treating an ownership-lookup failure as an LLM-API
error fires post_call_failure_hook for a successful upstream call and
returns a misleading provider-shaped error to the client. Run the
filter outside the try/except so genuine LLM errors stay scoped to
the upstream call.