Litellm oss staging 250526 (#28770)

* fix(mcp): handle OAuth IdP error responses in /callback (LIT-2750)

Per RFC 6749 section 4.1.2.1, when the IdP rejects an OAuth authorization
request it redirects back to the client with ?error=...&error_description=...
and no code. The MCP /callback handler declared code and state as required
query params, so FastAPI rejected such error responses with a 422 before
the handler ran -- stranding the MCP client waiting on the loopback.

This change:
- Makes code and state optional and accepts the RFC-defined error,
  error_description, and error_uri params.
- When state decodes to a trusted client redirect_uri, propagates the
  error params back to that URI with the client's original (un-wrapped)
  state preserved, so the client's OAuth library can surface the failure.
- When state is missing/undecryptable or the encoded redirect_uri is no
  longer trusted, renders a 400 HTML page with the (HTML-escaped) error
  details instead of leaking to an attacker-controlled redirect.
- Preserves the existing success path (code + state -> 302 to validated
  client redirect_uri with original state).

Fixes LIT-2750.

* test(mcp): regression tests for /callback handling IdP error responses (LIT-2750)

Adds a new test module covering the LIT-2750 fix: the MCP OAuth /callback
endpoint must accept IdP error responses (e.g. ?error=access_denied) per
RFC 6749 section 4.1.2.1 instead of returning a 422 because ``code`` is missing.

Coverage:
- IdP error with no state -> 400 HTML page surfacing the error.
- HTML escaping of user-controlled error / error_description fields.
- IdP error with a trusted (loopback) state -> 302 propagating
  error / error_description / original client state to the client.
- IdP error with an untrusted redirect_uri encoded in state -> 400 inline
  (no open-redirect to attacker-controlled origin).
- IdP error with an undecryptable state -> 400 HTML fallback.
- Bare GET /callback with no params -> 400 HTML (not Pydantic 422).
- Success path (code + state) still 302 to validated client redirect_uri
  with the original (un-wrapped) state preserved.

* refactor(mcp): drop unused _OAUTH_ERROR_PARAMS constant (Greptile P2)

The tuple was leftover scaffolding from an earlier draft of the LIT-2750
fix; nothing references it. The explanatory RFC 6749 §4.1.2.1 comment block
above the callback handler covers the same intent.

* fix(mcp/oauth): preserve empty original_state and clarify missing-param error in /callback

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* fix: apply black formatting to base_llm chat transformation

Fix CI black --check failure on is_thinking_enabled return formatting.

Co-authored-by: Cursor <cursoragent@cursor.com>

* merge main (#28836)

* fix(proxy): Bedrock Knowledge Base pass-through: preserve SigV4 headers and signed request body (#27526)

* Fix Bedrock KB pass-through SigV4 headers and signed body

Coerce botocore HeadersDict to a dict for pass-through routes. When
forward_headers is true, drop request headers that collide case-insensitively
with signed headers so client Bearer auth does not shadow AWS SigV4.
Send prepped.body as raw content so the outbound payload matches the
signature after logging hooks mutate the parsed dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Simplify pass-through raw body handling

Read the SigV4-signed bytes directly from request.state inside
pass_through_request instead of threading a custom_raw_body argument
through three functions. Helper methods are restored to their original
signatures, and the new branch lives in one place at each httpx call site.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Harden pass-through raw body read from request.state

Guard missing request.state (test fixtures) and ignore non-bytes/str
values so MagicMock does not trigger the SigV4 raw-body path.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Test pass_through_request state_raw_body uses httpx content=

Cover non-streaming (async_client.request) and streaming (build_request)
paths so SigV4 bytes on request.state are not replaced by json= of a
hook-mutated dict.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>

* chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728)

* chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214

The original account (888602223428) was put under a security restriction by
AWS after a root access key leaked in a PR comment. While that account works
its way through the AWS Support unlock process, Bedrock-touching CI tests have
been migrated to a fresh account (941277531214).

Changes:
  - Replace 26 hardcoded references to 888602223428 with 941277531214 across
    8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime
    ARNs, batch execution role ARN, and example proxy config).
  - The provisioned-model and imported-model ARNs are referenced only from
    mocked unit tests — no AWS resources to recreate.
  - The batch execution IAM role has been recreated in the new account with
    the same name and equivalent permissions.
  - The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC,
    hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account
    under the same names — see tools/agentcore-deploy/ in a follow-up.

CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME
were updated separately via the CircleCI API to point at the new account.

Smoke-tested locally against the new account:
  aws bedrock-runtime converse --region us-west-2 \
    --model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
    --messages '[{"role":"user","content":[{"text":"ping"}]}]'
  → 200, model returned 'pong'

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes

The first migration commit replaced just the account ID, but AgentCore
auto-assigns a random 10-char suffix to every runtime on creation — we
can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the
new account. Updated the AgentCore-runtime ARNs in the three files that
reference real runtime IDs (not the mock-based unit-test ARNs).

Deployed runtimes:
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp
  arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy

Both runtimes are status=READY and pass a smoke invoke:
  $ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}'
  → 200, {"result": "echo: ping"}

The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the
deploy artifacts). Tests that only verify the SDK wiring will pass; if any
test asserts on agent output content, swap the echo for the real agent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(tests): point Bedrock batch tests at new-account S3 bucket

The account migration (888602223428 -> 941277531214) was a flat
account-ID swap, which only rewrites ARNs that embed the account
number. S3 bucket names carry no account ID, so the live Bedrock
batch tests still uploaded to `litellm-proxy` — a bucket that lives
in the old account. S3 names are globally unique, and the old account
still holds that name, so it can't be recreated in the new account.

Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees
global uniqueness). The bucket must be created in 941277531214 and the
batch execution role granted s3:GetObject/PutObject/ListBucket on it
before this job is run in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): point live S3 logging test at new-account bucket

Same account-ID-free blind spot as the batch bucket: `load-testing-oct`
lives in the old account and its name can't be reused globally. The
`logging_testing` CI job is wired into the workflow and runs
test_basic_s3_logging, which uploads to this bucket with the CI env
creds, then lists and deletes objects — a live dependency.

Rename to `load-testing-oct-941277531214`. The bucket must exist in the
new account with the CI IAM principal granted
s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): repoint Bedrock guardrail IDs to new-account guardrails

The migration left guardrail IDs untouched (no account ID in them), so
all live guardrail tests failed with "guardrail identifier or version
does not exist" against 941277531214. Recreated both guardrails in the
new account and updated the hardcoded IDs:
  - wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD,
    with explicit inputAction=ANONYMIZE so masking applies to INPUT,
    which is the source litellm's moderation hook sends)
  - ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set
    to the exact string the tests assert on)

Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the
guardrailConfig in test_bedrock_completion.py. Verified locally: the 5
previously-failing guardrail tests now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): migrate legacy models to current inference profiles

The new CI account (941277531214) cannot invoke legacy Bedrock models
(AWS gates them: "marked by provider as Legacy... not actively using in
the last 30 days"). Migrated the live-call tests:
  - anthropic.claude-3-sonnet-20240229    -> us.anthropic.claude-sonnet-4-5-20250929-v1:0
  - anthropic.claude-3-haiku-20240307     -> us.anthropic.claude-haiku-4-5-20251001-v1:0
Current Claude models on Bedrock require the us. inference-profile prefix
(bare on-demand ids are rejected).

cohere.command-r-plus has no working replacement (all Cohere is legacy-
gated in the new account): swapped to claude-haiku-4-5 in provider-
agnostic param lists. amazon.titan-image-generator skipped (no working
replacement). Mocked/transformation/cost tests that reference the legacy
strings are intentionally left unchanged. Verified live against the new
account.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): repoint SageMaker + Knowledge Base to new-account resources

These referenced account-scoped resources by hardcoded id that only
existed in the old account, so the migration's account-ID swap missed
them. Recreated in 941277531214 and repointed:
  - SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614
    -> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge)
  - Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless
    vector store + titan-embed-text-v2, seeded with a LiteLLM doc)
Verified live: test_sagemaker.py (12 passed) and
test_bedrock_knowledgebase_hook.py (12 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214)

claude-opus-4-7 is listed in the new Bedrock CI account's foundation
models but invoke is denied (AccessDeniedException: "not available for
this account"). Bedrock access to the flagship Opus requires an AWS
Sales request, not the self-serve model-access toggle, so it can't be
enabled inline with the rest of the account migration.

Add an optional `skip_reason` to ModelEntry and set it on the
bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip.
Cell count (231) and route coverage are unchanged, so the structural
asserts still pass. Restore coverage by deleting the one skip_reason
line once access is granted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(bedrock): swap/skip legacy-gated models unavailable on new CI account

The migrated AWS account (941277531214) cannot access several models that
the old account could, so the remaining red CI jobs were hitting real
Bedrock "Access denied / Legacy" and "account not authorized" errors:

- image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is
  legacy-gated), matching the existing titan skip.
- batches: skip test_async_file_and_batch (Bedrock batch inference is not
  authorized on the new account; requires an AWS support case).
- litellm_overhead: swap legacy claude-3-5-haiku for the active
  us.anthropic.claude-haiku-4-5 inference profile.
- test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the
  active us.anthropic.claude-sonnet-4-5 inference profile.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account

- e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference
  is not authorized on account 941277531214) and migrate the missed
  s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214.
- build_and_test: swap legacy bedrock claude-3-sonnet for the active
  us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured
  output e2e test.

https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa

* test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791)

Replace the silent skips added for the new CI account with noisier behavior:
- reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present)
  instead of skipping, so the missing entitlement stays visible in CI; they
  still skip when AWS creds are absent (local dev)
- Bedrock batch inference tests: drop the skip so they run and fail until
  batch access is granted
- Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the
  transform + cost-tracking path stays under test without live model access

https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT

Co-authored-by: Claude <noreply@anthropic.com>

* test(bedrock): use pytest.xfail for known-failing opus-4-7 cells

Replace pytest.fail with pytest.xfail when a model has a fail_reason,
so known-broken cells stay visible as XFAIL without keeping CI red.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(otel): export SERVER span on management-endpoint success without http_request (#28794)

Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>

* chore(ci): merge dev branch (#28801)

* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>

* chore(ci): merge dev branch (#28657)

* feat(dashboard): navbar hierarchy + Agent Platform notifications (#27543)

* feat(dashboard): refine navbar zones and Agent Platform notice

Restructure the admin navbar for production users: clear product vs community
vs personal columns with vertical dividers, icon-only Slack/GitHub in a
shared chip, and Docs/Blog typography aligned on an 8px rhythm.

Add a notifications bell with popover linking to the LiteLLM Agent Platform
repo and optional mark-as-read persistence.

Promote the account control with initials avatar, single-line display name,
and navDisplayName mapping for placeholder user ids (e.g. default_user_id).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(dashboard): address PR review — AntD buttons, public page guard, dedupe regex

- Replace raw <button> with AntD Button in BlogDropdown, NotificationsBell, UserDropdown, and test mock
- Guard NotificationsBell + container behind !isPublicPage to avoid rendering on public pages
- Remove redundant equality checks in navDisplayName (regex already covers them)
- Remove unused `lower` variable after simplification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix(dashboard): drop dead useHealthReadiness import in navbar

The module was removed in #27896 (replaced by useHealthReadinessDetails),
but the import survived the rebase. The symbol is unused — only
useHealthReadinessDetails is consumed in the file. Removing the dead
import unblocks the UI TypeScript build.

* fix(dashboard): align CommunityEngagementButtons test with icon-only aria-labels

The component was refactored to an icon-only chip with aria-label='LiteLLM
on GitHub' (squash #27543), but the test still asserted /star us on
github/i. Update the query to match the rendered accessible name.

* refactor(dashboard): drop unused props from NavbarProps

The navbar refactor moved user identity + dark-mode state to internal
hooks (useAuthorized, useWorker), but the NavbarProps interface still
declared userID, userEmail, userRole, premiumUser, isDarkMode, and
toggleDarkMode as required, forcing every caller to thread them through.

Drop them from the interface and all four call sites (page.tsx,
(dashboard)/layout.tsx, public_model_hub.tsx, navbar.test.tsx). Also
shrinks the destructure in layout.tsx so the now-unused locals stop
being pulled out of useAuthorized().

* refactor(dashboard): use useSyncExternalStore for NotificationsBell dismiss flag

Reads/writes of the litellmHideAgentPlatformBanner key were done
directly inside NotificationsBell via a useEffect + useState pair.
Every other localStorage-backed flag in the dashboard (Disable
ShowPrompts, DisableBouncingIcon, DisableShowNewBadge,
DisableUsageIndicator, DisableBlogPosts) is wrapped in a
useSyncExternalStore hook over localStorageUtils so all mounted
components stay in sync.

Extract useHideAgentPlatformBanner to follow the same shape, swap
NotificationsBell to consume it, and add a regression test that
two sibling bells stay in sync without a remount when one is
dismissed.

* refactor: mask credential fields in proxy settings GET responses (#28682)

* refactor: mask credential fields in proxy settings GET responses

Brings SSO settings, cache settings, and the email/Slack alerting view in
/get/config/callbacks in line with the HashiCorp Vault config-override
pattern, so persisted credentials are not transported back to the UI in
plaintext.

* refactor: harden short-value masking and hoist alerting var constant

Closes two review observations:

- mask_sensitive_keys now replaces short values (below the visible
  prefix+suffix length) with an all-mask string instead of returning them
  unchanged, so a 1-7 character credential is no longer round-tripped
  verbatim.
- _ALERTING_SENSITIVE_VARS is moved out of get_config() to a module-level
  constant, matching the analogous _SSO_SENSITIVE_FIELDS and
  _CACHE_SENSITIVE_FIELDS in the SSO and cache endpoint files.

---------

Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ui): show 2-decimal precision for max_budget on key overview (#28809)

The Key Info Overview tab's Spend card truncated sub-dollar budgets to
"$0" because formatNumberWithCommas defaults to 0 decimals. The Settings
tab passes 2; align the overview so a $0.10 budget renders as "$0.10".

Resolves LIT-2845

* feat(proxy): allow `llm_api_routes` virtual keys to list MCP servers (#28442)

* feat(proxy): allow llm_api_routes virtual keys to list MCP servers

Add a new `mcp_discovery_routes` group (GET /v1/mcp/server and GET
/v1/mcp/server/{server_id}) and include it in `llm_api_routes` so that
virtual keys configured with `allowed_routes=["llm_api_routes"]` can
discover the MCP servers they have access to. Previously these calls
failed with 'Virtual key is not allowed to call this route. Only allowed
to call routes: [llm_api_routes]'.

The GET handlers already sanitize the response for restricted virtual
keys via `_sanitize_mcp_server_list_for_virtual_key`, stripping
credential-bearing fields (url, headers, env). Write methods
(POST/PUT/DELETE) on the same paths remain gated by the existing
handler-level admin role checks.

The new discovery list is intentionally kept OUT of
`mcp_inference_routes`, so `is_llm_api_route()` still returns False
for these paths — this preserves the existing contract that
DISABLE_LLM_API_ENDPOINTS must not block the Admin UI from listing MCP
servers.

Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* refactor(proxy): make MCP discovery carve-out method-aware

Replace the `mcp_discovery_routes` group in `llm_api_routes` with a
method-aware special case inside `is_virtual_key_allowed_to_call_route`.
Virtual keys with allowed_routes=["llm_api_routes"] are now permitted
to call only GET /v1/mcp/server and GET /v1/mcp/server/{server_id} —
non-GET methods and multi-segment admin sub-paths fall through to the
existing 403. This keeps the general llm_api_routes list free of
management paths and avoids accidentally exposing POST/PUT/DELETE
writes through the route-check layer.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>

* chore(ci): merge dev branch (#28807)

* chore(proxy): route path-dependent call sites through get_request_route

Replace direct ``request.url.path`` reads in auth, ACL, routing, and
audit-log decisions with ``get_request_route(request)`` — the helper
already added in ``auth/auth_utils.py`` that returns the ASGI
``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs
``url.path`` from the Host header; ``scope["path"]`` is uvicorn's
parse of the request line and matches what FastAPI dispatches on, so
it's the authoritative route for any decision that should agree with
the actual handler.

Sites:
- _experimental/mcp_server/auth/user_api_key_auth_mcp.py
- management_endpoints/mcp_management_endpoints.py
- vector_store_endpoints/utils.py
- pass_through_endpoints/pass_through_endpoints.py
- auth/route_checks.py
- litellm_pre_call_utils.py
- spend_tracking/spend_management_endpoints.py
- common_utils/http_parsing_utils.py
- management_helpers/utils.py
- health_endpoints/_health_endpoints.py

Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py
that construct a Request with scope["path"] set to a benign route and
the Host header crafted so url.path would resolve differently; each
site's decision is asserted against scope["path"].

* chore(proxy): make get_request_route imports lazy at call sites

Move the ``from litellm.proxy.auth.auth_utils import get_request_route``
imports added in the prior commit back to the function bodies that use
them. The module-level form participates in a long-standing import
cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL
on the PR; the lazy form matches the pattern the proxy already uses
for ``user_api_key_auth`` and related helpers elsewhere in these files.

Also drop the ``RouteChecks._is_assistants_api_request`` delegation in
``_get_metadata_variable_name`` introduced in the prior commit — the
delegation pulled ``RouteChecks`` into the same cycle, and the call
site reuses the resolved route for its other branches, so inlining
the substring check is both cycle-free and avoids a redundant second
``get_request_route`` call.

Comment in test_proxy_routes.py acknowledges that the two MCP table
entries exercise ``get_request_route`` directly rather than the full
production handler (which needs ASGI scope + MCP state to invoke).

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>

* fix(team): keep team_alias cache in sync on _cache_team_object writes (#28737)

* fix(team): keep team_alias cache in sync on _cache_team_object writes

_cache_team_object wrote only to the team_id:<id> cache key, but the
JWT auth path that uses team_alias_jwt_field reads from a separate
team_alias:<alias> key (get_team_object_by_alias caches under both
keys on miss, but reads only the alias-keyed one). After any
team-mutation endpoint (team_model_add, team_model_delete,
update_team, the two access-group writes) the team_id cache was
refreshed but the team_alias cache stayed stale until TTL — JWT
callers using team_alias_jwt_field kept seeing the pre-mutation
team for the full cache window.

Mirror the write under the alias key inside _cache_team_object so
every existing caller stays in sync without further changes. Skip
the alias write when team_alias is None/empty so we don't collide
across alias-less teams.

Surfaced testing the LIT-3244 cherry-pick on patch/1.86.0: the
LIT-3244 fix correctly invalidated the team_id cache but the
customer's JWT used team_alias_jwt_field, so they kept hitting the
stale alias-keyed entry.

* fix(team): delete (not overwrite) team_alias cache on _cache_team_object

The prior shape of this PR wrote both team_id:<id> AND team_alias:<alias>
from _cache_team_object. team_alias is NOT unique in the schema
(no @unique on LiteLLM_TeamTable.team_alias), and get_team_object_by_alias
enforces uniqueness on its own DB-fetch path (len(teams) > 1 raises).
Writing the alias-keyed cache from the generic refresh path bypassed
that check: a team admin renaming their team to collide with another
team's alias could silently overwrite the cached team for JWT-by-alias
auth, swapping the resolved team under that alias for the cache window.

Switch the alias-keyed operation from a write to a delete (mirroring
the dual-cache delete pattern in _delete_cache_key_object). After every
team write, the next JWT-by-alias reader cache-misses and falls through
to get_team_object_by_alias, which (a) re-fetches the fresh team from
DB, closing the LIT-3244 staleness gap that motivated this PR, and
(b) enforces alias uniqueness before populating either cache key.

team_id:<id> writes are unchanged — team_id is the table PK and is
guaranteed unique.

Surfaced in veria-ai review on #28739.

* fix(managed-files): anchor model_id regex so it doesn't match llm_output_file_model_id

extract_model_id_from_unified_id used `re.search(r"model_id,([^;]+)", ...)`
which substring-matches the `model_id,` inside the file-ID encoding's
`llm_output_file_model_id,<deployment_uuid>` field. parse_unified_id
then fed that deployment UUID back into the auth path as a model
candidate via _extract_models_from_managed_resource_id, and every
team-BYOK file attach 403'd with:

    team not allowed to access model. This team can only access
    models=['openai/*']. Tried to access <deployment-uuid>

The team's models list correctly contains the public name (`openai/*`)
that target_model_names matches, but the bogus UUID candidate fails
the wildcard check first.

Anchor the regex to a field boundary (`(?:^|;)model_id,`) so it
matches the legitimate top-level `model_id,<value>` field on
vector_store unified IDs and skips substring matches inside other
fields. File-IDs (which have no top-level `model_id` field) now
return None and contribute no spurious UUID candidate.

Surfaced reproducing LIT-3244 on patch/1.86.0 with the customer's
exact flow: team with openai/* BYOK deployment, JWT-scoped user,
POST /v1/vector_stores/{id}/files attaching a file uploaded with
target_model_names=openai/gpt-4o.

* fix(proxy): hydrate wildcard discovery credentials (#28284) (#28822)

* fix(proxy): hydrate wildcard discovery credentials

* fix(proxy): constrain wildcard credential hydration

Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>

* ci: add daily oss-agent-shin branch creation workflow (#28829)

Creates litellm_oss_agent_shin_MM_DD_YYYY from main every day at 00:00 UTC.
Lets us retarget oss-agent-shin fork PRs onto a canonical branch so CircleCI runs with secrets, without granting the agent write access.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

* test(proxy): add harness for proxy_server.py behavior-pinning (#28827)

* test(proxy): add harness for proxy_server.py behavior-pinning

Creates tests/test_litellm/proxy/proxy_server/ with:
- conftest.py: 11 shared fixtures (app, client, mock_prisma, auth_as,
  mock_router with parametrized response builders, normalize, etc.)
- _coverage_check.py: per-PR coverage gate (line + branch) against a
  baseline, self-selects target by inspecting which placeholder files
  have been filled
- _pin_check.py: AST-based gate that verifies every pin-list item has
  >=1 happy + >=1 error test with a real assertion (no status-only)
- test_harness_smoke.py: 19 smoke tests covering every fixture +
  both scripts end-to-end
- 26 placeholder test files (one docstring each) reserved for
  follow-up PRs per the directory ownership in the Notion plan
- .coverage_baseline pinned at 0% so future PRs measure deltas
  against new-tests-only and aren't entangled with the broader
  scattered test suite

Adds a dedicated proxy-server job to test-unit-proxy-endpoints.yml
so this directory's runtime + coverage are tracked independently.

Plan: https://www.notion.so/36c43b8acdab81ee845fd5365128a2fc

* ci(proxy-endpoints): allow workflow_dispatch

Lets the workflow be triggered manually on a branch via
`gh workflow run`, which is needed for the verify-first
flow on workflow changes before opening a PR.

* test(proxy): address review feedback on proxy_server harness

- conftest.py: anchor sys.path insert to __file__ (Path(__file__).resolve().parents[4])
  instead of CWD-relative os.path.abspath("../../../../") which resolved
  to the wrong directory when pytest is launched from the repo root.
- _coverage_check.py: actually read .coverage_baseline and use it as
  the floor (line_min = max(target, baseline)). Closes the gap between
  the PR description's "delta semantics" and what the script was doing.
  With baseline=0.0 today this is a no-op; future PRs that update the
  baseline cause regressions (test deletions etc.) to trip the gate
  even if the static PR target is still met.
- _pin_check.py: drop unreachable startswith("_") guard
  (test_*.py glob never yields underscore-prefixed names) and read
  each test file once instead of twice.

* feat(openai): apply regional-processing cost uplift for EU/US data residency (#28626)

* feat(openai): apply regional-processing cost uplift for EU/US data residency

OpenAI charges a 10% uplift on the latest GPT models when requests are
served from a regionalized hostname (eu./us.api.openai.com).  Infer the
region from `api_base`, expose it on `kwargs["litellm_params"]["data_residency"]`,
and multiply the computed cost by a per-model
`regional_processing_uplift_multiplier_<region>` field.

https://claude.ai/code/session_012ebH44s7ohYxjoix5CXzTW

* test: allow regional_processing_uplift_multiplier_{eu,us} in model_prices schema

* fix(cost): tighten data_residency inference and restore model_cost in tests

- Only infer OpenAI data_residency when custom_llm_provider == "openai";
  drop the implicit None fallback so non-OpenAI callers can't accidentally
  pick up a regional tag from a stray OpenAI hostname.
- _local_model_cost_map fixture now snapshots and restores
  litellm.model_cost and LITELLM_LOCAL_MODEL_COST_MAP so tests don't leak
  state across the session.

* refactor(openai): move data_residency helper under llms/openai

* fix: thread data_residency through realtime stream cost calculation

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(cost): thread data_residency through batch_cost_calculator

Apply the OpenAI regional-processing uplift multiplier to retrieve_batch
cost paths so Batch API requests served via eu./us.api.openai.com are
priced at the same uplifted token rates as completions/transcriptions.

* refactor(openai): encapsulate provider check inside infer_openai_data_residency

Move the custom_llm_provider == "openai" guard from get_litellm_params
into the helper itself so the core utility no longer carries
provider-specific dispatch logic. Callers pass through the provider
unconditionally; the helper returns None for any non-OpenAI provider.

* fix(responses): thread data_residency through Responses logging params

The Responses API paths build their logging litellm_params dict after
provider resolution but did not include data_residency, so cost calc
saw None even when the effective api_base was a regional OpenAI host.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

* fix: preserve OTEL response payload and remove duplicate constant

- _emit_management_endpoint_otel_span now passes result as response on success
- remove duplicate _CREDENTIAL_LITELLM_PARAM_FIELDS assignment in model_checks

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: address bug detection findings

- pass_through_endpoints: use request.method instead of hardcoded POST
  in streaming SigV4-signed request path for consistency with the
  non-streaming branch
- llm_cost_calc/utils: hoist DataResidency value set to a module-level
  frozenset to avoid rebuilding it on every cost calculation
- example_config_yaml/oai_misc_config: replace real-looking AWS account
  ID with placeholder 123456789012 in example bucket and role ARN

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* chore(github_copilot): refresh model catalog from upstream /models API (#28055)

Aligns the github_copilot catalog with values returned by Copilot's
public /models endpoint (capabilities.limits + capabilities.supports +
model.supported_endpoints).

- Adds 10 new model entries: claude-opus-4.7, claude-sonnet-4.6,
  gemini-3-flash-preview, gemini-3.1-pro-preview, gpt-4-0125-preview,
  gpt-5.2-codex, gpt-5.4, gpt-5.4-mini, gpt-5.5, oswe-vscode-prime.
- Updates max_input_tokens for existing entries to reflect each
  model's true context window (e.g. gpt-4o-mini 64000 -> 128000,
  gpt-5-mini 128000 -> 264000, gpt-5.3-codex 128000 -> 400000,
  claude-haiku-4.5 128000 -> 200000).
- Adds supports_reasoning, supports_response_schema,
  supports_function_calling, supports_parallel_function_calling,
  supports_vision based on capabilities.supports.
- Declares supported_endpoints for entries missing it
  (e.g. gpt-3.5-turbo, gpt-4o, embeddings).
- For responses-only models (gpt-5.2-codex, gpt-5.4, gpt-5.4-mini,
  gpt-5.5), sets mode to 'responses'.
- gpt-41-copilot.mode changes from 'completion' to 'chat' because
  Copilot reports capabilities.type = 'chat'. Revertible on request.

Pricing fields and other manually-curated values are preserved.

* feat(datadog): emit litellm.overhead.latency as a standalone Datadog metric (#28831)

Adds a new `litellm.overhead.latency` gauge metric to `DatadogMetricsLogger`
(the `/api/v2/series` path). The value is sourced from
`hidden_params["litellm_overhead_time_ms"]` already computed in
`ResponseMetadata` and exposed in `StandardLoggingPayload`.

Matches the Prometheus integration which exposes the same value via
`litellm_overhead_latency_metric`. Emitted in seconds (ms ÷ 1000) for
consistency with the other latency series.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Shin <shin@litellm.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>

* feat(arize): route Phoenix traces via per-project TracerProviders (#28876)

Use LRU-cached TracerProviders with project-scoped OTEL Resources so team/key
metadata routes traces correctly. On the proxy, project selection is limited to
server-controlled user_api_key_auth_metadata; client metadata fields stay banned.

* fix(arize_phoenix): skip _emit_semantic_logs on failure path

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(arize_phoenix): skip raw request logging and metrics on failure path

Restores pre-refactor behavior: _handle_failure no longer emits raw-request
sub-spans or records OTEL metrics, matching the original _handle_failure
that did not call these helpers.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(security): close two medium telemetry trust-boundary issues

Issue 1 (arize_phoenix.py — caller-controlled telemetry routing):
- _is_proxy_request no longer detects proxy mode by checking
  user_api_key_auth_metadata in request metadata.  That field is
  user-supplied, so an authenticated caller could fake proxy-mode
  detection and have _project_from_metadata_dict read their own dict
  for project selection, routing telemetry to arbitrary Arize/Phoenix
  projects.  Proxy mode is now determined solely by the server-set
  proxy_server_request field in litellm_params.
- auth_utils.py adds user_api_key_auth_metadata to the banned request
  body params list so the proxy rejects any attempt to supply the field
  at the HTTP layer.  The field is server-reserved: it is written
  exclusively by add_user_api_key_auth_to_request_metadata from the
  authenticated key's database record after the ban check runs.

Issue 2 (management_helpers/utils.py — API key in OTEL span):
- _emit_management_endpoint_otel_span stripped plaintext credential
  fields (key, token, api_key, secret, …) from the response dict before
  passing it to the OTEL success hook.  dict(result) on a Pydantic
  GenerateKeyResponse includes the freshly-generated key field, which
  would previously be written as a span attribute to every configured
  OTEL collector/backend.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com>
Co-authored-by: Shin <shin@litellm.ai>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
This commit is contained in:
Sameer Kankute 2026-05-27 00:27:39 +05:30 committed by GitHub
parent bd2d0ad519
commit d52fbfb458
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
18 changed files with 1649 additions and 250 deletions

View File

@ -30,7 +30,7 @@ spec:
checksum/config: {{ include (print $.Template.BasePath "/configmap-litellm.yaml") . | sha256sum }}
{{- end }}
{{- with .Values.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- tpl (toYaml .) $ | nindent 8 }}
{{- end }}
labels:
{{- include "litellm.labels" . | nindent 8 }}

View File

@ -377,3 +377,28 @@ tests:
content:
name: sidecar-tpl
image: "ghcr.io/berriai/litellm-database:test"
- it: should support tpl in podAnnotations
template: deployment.yaml
set:
image:
repository: ghcr.io/berriai/litellm-database
tag: test
# Mirrors the real-world scenario this feature unblocks:
# user disables the built-in ConfigMap (and its built-in checksum/config
# annotation) and re-implements checksum/config themselves via tpl.
proxyConfigMap:
create: false
podAnnotations:
checksum/config: "{{ .Values.image.tag }}"
example.com/some-key: "{{ .Values.image.repository }}"
example.com/literal: "plain-string-value"
asserts:
- equal:
path: spec.template.metadata.annotations["checksum/config"]
value: "test"
- equal:
path: spec.template.metadata.annotations["example.com/some-key"]
value: "ghcr.io/berriai/litellm-database"
- equal:
path: spec.template.metadata.annotations["example.com/literal"]
value: "plain-string-value"

View File

@ -1,5 +1,7 @@
import os
from typing import TYPE_CHECKING, Any, Optional, Union
import threading
from collections import OrderedDict
from typing import TYPE_CHECKING, Any, Optional, Tuple, Union
from litellm._logging import verbose_logger
from litellm.integrations.arize import _utils
@ -8,8 +10,10 @@ from litellm.types.integrations.arize_phoenix import ArizePhoenixConfig
if TYPE_CHECKING:
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SpanProcessor
from opentelemetry.trace import Span as _Span
from opentelemetry.trace import SpanKind
from opentelemetry.trace import Tracer
from litellm.integrations.opentelemetry import OpenTelemetry as _OpenTelemetry
from litellm.integrations.opentelemetry import (
@ -21,20 +25,27 @@ if TYPE_CHECKING:
OpenTelemetryConfig = _OpenTelemetryConfig
Span = Union[_Span, Any]
OpenTelemetry = _OpenTelemetry
LITELLM_TRACER_NAME: str
else:
Protocol = Any
OpenTelemetryConfig = Any
Span = Any
Tracer = Any
TracerProvider = Any
SpanKind = Any
# Import OpenTelemetry at runtime
SpanProcessor = Any
try:
from litellm.integrations.opentelemetry import OpenTelemetry
from litellm.integrations.opentelemetry import (
LITELLM_TRACER_NAME,
OpenTelemetry,
)
except ImportError:
LITELLM_TRACER_NAME = "litellm"
OpenTelemetry = None # type: ignore
ARIZE_HOSTED_PHOENIX_ENDPOINT = "https://otlp.arize.com/v1/traces"
_MAX_PROJECT_PROVIDERS = 64
class ArizePhoenixLogger(OpenTelemetry): # type: ignore
@ -48,37 +59,142 @@ class ArizePhoenixLogger(OpenTelemetry): # type: ignore
def _init_tracing(self, tracer_provider):
"""
Override to always create a *private* TracerProvider for Arize Phoenix.
Override to create per-project TracerProviders (LRU-cached) for Arize Phoenix.
The base ``OpenTelemetry._init_tracing`` falls back to the global
TracerProvider when one already exists. That causes whichever
integration initialises second to silently reuse the first one's
exporter, so spans only reach one destination.
By creating our own provider we guarantee Arize Phoenix always gets
its own exporter pipeline, regardless of initialisation order.
"""
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.trace import SpanKind
if tracer_provider is not None:
# Explicitly supplied (e.g. in tests) — honour it.
self.tracer = tracer_provider.get_tracer("litellm")
self._use_injected_tracer_provider = True
self._shared_span_processor = None
self.tracer = tracer_provider.get_tracer(LITELLM_TRACER_NAME)
self.span_kind = SpanKind
return
# Always create a dedicated provider — never touch the global one.
provider = TracerProvider(resource=self._get_litellm_resource(self.config))
provider.add_span_processor(self._get_span_processor())
self.tracer = provider.get_tracer("litellm")
self._use_injected_tracer_provider = False
self._project_providers: OrderedDict[str, TracerProvider] = OrderedDict()
self._project_providers_lock = threading.Lock()
self._shared_span_processor = self._get_span_processor()
self.span_kind = SpanKind
default_project = self._resolve_project_name({})
self.tracer = self._get_tracer_for(default_project)
verbose_logger.debug(
"ArizePhoenixLogger: Created dedicated TracerProvider "
"(endpoint=%s, exporter=%s)",
"ArizePhoenixLogger: Initialized per-project TracerProvider cache "
"(default_project=%s, endpoint=%s, exporter=%s)",
default_project,
self.config.endpoint,
self.config.exporter,
)
def flush_tracer_providers(self) -> None:
"""
Flush all cached per-project providers and the shared span processor.
Call on graceful proxy shutdown. Do not call on LRU eviction in-flight
spans may still reference evicted providers.
"""
if getattr(self, "_use_injected_tracer_provider", False):
return
shared_processor = getattr(self, "_shared_span_processor", None)
if shared_processor is not None:
try:
shared_processor.force_flush()
except Exception as e:
verbose_logger.debug(
"ArizePhoenixLogger: shared span processor force_flush failed: %s",
e,
)
with getattr(self, "_project_providers_lock", threading.Lock()):
providers = list(getattr(self, "_project_providers", {}).values())
for provider in providers:
try:
provider.force_flush()
except Exception as e:
verbose_logger.debug(
"ArizePhoenixLogger: TracerProvider force_flush failed: %s", e
)
def _get_litellm_resource_for_project(self, project_name: str):
"""
Build an OTEL Resource with project routing attrs that win over env detector.
Phoenix uses ``openinference.project.name``; Arize AX uses ``model_id`` and
``service.name``. Project attrs are merged last so OTEL_RESOURCE_ATTRIBUTES
from init does not pin every provider to one project.
"""
from opentelemetry.sdk.resources import OTELResourceDetector, Resource
project_attributes: dict[str, str] = {
"openinference.project.name": project_name,
"model_id": project_name,
"service.name": project_name,
}
deployment_environment = getattr(self.config, "deployment_environment", None)
if deployment_environment is not None:
project_attributes["deployment.environment"] = deployment_environment
env_resource = OTELResourceDetector().detect()
project_resource = Resource.create(project_attributes) # type: ignore[arg-type]
return env_resource.merge(project_resource)
def _build_tracer_provider_for_project(self, project_name: str) -> TracerProvider:
"""Create a TracerProvider for *project_name* (caller holds no cache lock)."""
from opentelemetry.sdk.trace import TracerProvider
provider = TracerProvider(
resource=self._get_litellm_resource_for_project(project_name)
)
provider.add_span_processor(self._shared_span_processor)
return provider
def _get_tracer_for(self, project_name: str) -> Tracer:
"""Return a tracer for *project_name*, creating/caching a provider on miss."""
if getattr(self, "_use_injected_tracer_provider", False):
return self.tracer
with self._project_providers_lock:
if project_name in self._project_providers:
self._project_providers.move_to_end(project_name)
return self._project_providers[project_name].get_tracer(
LITELLM_TRACER_NAME
)
# OTELResourceDetector().detect() is synchronous; build outside the lock so
# concurrent requests for other projects are not blocked on cache misses.
new_provider = self._build_tracer_provider_for_project(project_name)
with self._project_providers_lock:
if project_name in self._project_providers:
self._project_providers.move_to_end(project_name)
return self._project_providers[project_name].get_tracer(
LITELLM_TRACER_NAME
)
if len(self._project_providers) >= _MAX_PROJECT_PROVIDERS:
self._project_providers.popitem(last=False)
self._project_providers[project_name] = new_provider
return new_provider.get_tracer(LITELLM_TRACER_NAME)
def _resolve_tracer_for_kwargs(self, kwargs: dict) -> Tuple[str, Tracer]:
"""Resolve project name once and return the matching tracer."""
project_name = self._resolve_project_name(kwargs)
return project_name, self._get_tracer_for(project_name)
def get_tracer_to_use_for_request(self, kwargs: dict) -> Tracer:
"""Route guardrail/raw-request spans to the same per-project tracer as the request."""
if getattr(self, "_use_injected_tracer_provider", False):
return self.tracer
return self._resolve_tracer_for_kwargs(kwargs)[1]
def _init_otel_logger_on_litellm_proxy(self):
"""
Override: Arize Phoenix should NOT overwrite the proxy's
@ -93,56 +209,109 @@ class ArizePhoenixLogger(OpenTelemetry): # type: ignore
@staticmethod
def set_arize_phoenix_attributes(span: Span, kwargs, response_obj):
from litellm.integrations.opentelemetry_utils.base_otel_llm_obs_attributes import (
safe_set_attribute,
)
_utils.set_attributes(span, kwargs, response_obj, ArizeOTELAttributes)
# Dynamic project name: check metadata first, then fall back to env var config
dynamic_project_name = ArizePhoenixLogger._get_dynamic_project_name(kwargs)
if dynamic_project_name:
safe_set_attribute(span, "openinference.project.name", dynamic_project_name)
else:
# Fall back to static config from env var
config = ArizePhoenixLogger.get_arize_phoenix_config()
if config.project_name:
safe_set_attribute(
span, "openinference.project.name", config.project_name
)
return
@staticmethod
def _get_dynamic_project_name(kwargs) -> Optional[str]:
"""
Retrieve dynamic Phoenix project name from request metadata.
def _normalize_project_name(name: Optional[str]) -> Optional[str]:
if name is None:
return None
normalized = str(name).strip()
return normalized if normalized else None
Users can set `metadata.phoenix_project_name` in their request to route
traces to different Phoenix projects dynamically.
"""
standard_logging_payload = kwargs.get("standard_logging_object")
if isinstance(standard_logging_payload, dict):
metadata = standard_logging_payload.get("metadata")
@staticmethod
def _iter_metadata_dicts_from_kwargs(kwargs: dict):
"""Yield request metadata dicts; standard_logging_object before litellm_params."""
for key in ("standard_logging_object", "litellm_params"):
found_key = kwargs.get(key)
if not isinstance(found_key, dict):
continue
metadata = found_key.get("metadata")
if isinstance(metadata, dict):
project_name = metadata.get("phoenix_project_name")
if project_name:
return str(project_name)
yield metadata
# Also check litellm_params.metadata for SDK usage
@staticmethod
def _is_proxy_request(kwargs: dict) -> bool:
"""True when the call is routed through the LiteLLM proxy.
Proxy mode is determined solely by the server-set ``proxy_server_request``
field in ``litellm_params``. Checking request metadata for
``user_api_key_auth_metadata`` is intentionally avoided: that field is
user-supplied and would let an authenticated caller fake proxy-mode
detection to route their telemetry into arbitrary Arize/Phoenix projects.
"""
litellm_params = kwargs.get("litellm_params")
if isinstance(litellm_params, dict):
metadata = litellm_params.get("metadata") or {}
else:
metadata = {}
if isinstance(metadata, dict):
project_name = metadata.get("phoenix_project_name")
if project_name:
return str(project_name)
return isinstance(litellm_params, dict) and bool(
litellm_params.get("proxy_server_request")
)
@staticmethod
def _project_from_metadata_dict(
metadata: dict, metadata_key: str, *, proxy_mode: bool
) -> Optional[str]:
"""
Read a Phoenix project field from proxy/SDK metadata.
On the proxy, only ``user_api_key_auth_metadata`` (team/key config) may
select the project. SDK callers may still set project fields directly on
``metadata``.
"""
auth_metadata = metadata.get("user_api_key_auth_metadata")
if isinstance(auth_metadata, dict):
project = ArizePhoenixLogger._normalize_project_name(
auth_metadata.get(metadata_key)
)
if project:
return project
if not proxy_mode:
return ArizePhoenixLogger._normalize_project_name(
metadata.get(metadata_key)
)
return None
def _get_phoenix_context(self, kwargs):
@staticmethod
def _metadata_project_from_kwargs(kwargs: dict, metadata_key: str) -> Optional[str]:
proxy_mode = ArizePhoenixLogger._is_proxy_request(kwargs)
for metadata in ArizePhoenixLogger._iter_metadata_dicts_from_kwargs(kwargs):
project = ArizePhoenixLogger._project_from_metadata_dict(
metadata, metadata_key, proxy_mode=proxy_mode
)
if project:
return project
return None
@staticmethod
def _resolve_project_name(kwargs: dict) -> str:
"""
Resolve the target Phoenix/Arize project for this request.
Proxy priority: ``user_api_key_auth_metadata.phoenix_project_name_override``,
``user_api_key_auth_metadata.phoenix_project_name``, env, then ``default``.
SDK priority: request metadata fields, then env, then ``default``.
"""
override = ArizePhoenixLogger._metadata_project_from_kwargs(
kwargs, "phoenix_project_name_override"
)
if override:
return override
phoenix_name = ArizePhoenixLogger._metadata_project_from_kwargs(
kwargs, "phoenix_project_name"
)
if phoenix_name:
return phoenix_name
env_name = ArizePhoenixLogger._normalize_project_name(
os.environ.get("PHOENIX_PROJECT_NAME")
or os.environ.get("ARIZE_PROJECT_NAME")
)
if env_name:
return env_name
return "default"
def _get_phoenix_context(self, kwargs, tracer: Optional[Tracer] = None):
"""
Build a trace context for Phoenix's dedicated TracerProvider.
@ -159,11 +328,13 @@ class ArizePhoenixLogger(OpenTelemetry): # type: ignore
"""
from opentelemetry import trace
if tracer is None:
tracer = self._resolve_tracer_for_kwargs(kwargs)[1]
litellm_params = kwargs.get("litellm_params", {}) or {}
proxy_server_request = litellm_params.get("proxy_server_request", {}) or {}
headers = proxy_server_request.get("headers", {}) or {}
# Propagate distributed trace context if the caller sent a traceparent
traceparent_ctx = (
self.get_traceparent_from_header(headers=headers)
if headers.get("traceparent")
@ -173,10 +344,8 @@ class ArizePhoenixLogger(OpenTelemetry): # type: ignore
is_proxy_mode = bool(proxy_server_request)
if is_proxy_mode:
# Create a parent span on Phoenix's own tracer so both parent
# and child are exported to Phoenix.
start_time_val = kwargs.get("start_time", kwargs.get("api_call_start_time"))
parent_span = self.tracer.start_span(
parent_span = tracer.start_span(
name="litellm_proxy_request",
start_time=(
self._to_ns(start_time_val) if start_time_val is not None else None
@ -187,100 +356,77 @@ class ArizePhoenixLogger(OpenTelemetry): # type: ignore
ctx = trace.set_span_in_context(parent_span)
return ctx, parent_span
# SDK mode — no parent span needed
return traceparent_ctx, None
def _handle_success(self, kwargs, response_obj, start_time, end_time):
"""
Override to always create spans on ArizePhoenixLogger's dedicated TracerProvider.
The base class's ``_get_span_context`` would find the parent span created by
the ``otel`` callback on the *global* TracerProvider. That span is invisible
in Phoenix (different exporter pipeline), so we ignore it and build our own
hierarchy via ``_get_phoenix_context``.
"""
from opentelemetry.trace import Status, StatusCode
verbose_logger.debug(
"ArizePhoenixLogger: Logging kwargs: %s, OTEL config settings=%s",
kwargs,
self.config,
self._handle_phoenix_trace(
kwargs, response_obj, start_time, end_time, success=True
)
ctx, parent_span = self._get_phoenix_context(kwargs)
# Create litellm_request span (child of our parent when in proxy mode)
span = self.tracer.start_span(
name=self._get_span_name(kwargs),
start_time=self._to_ns(start_time),
context=ctx,
)
span.set_status(Status(StatusCode.OK))
self.set_attributes(span, kwargs, response_obj)
# Raw-request sub-span (if enabled) — must be created before
# ending the parent span so the hierarchy is valid.
self._maybe_log_raw_request(kwargs, response_obj, start_time, end_time, span)
span.end(end_time=self._to_ns(end_time))
# Guardrail span
self._create_guardrail_span(kwargs=kwargs, context=ctx)
# Annotate and close our proxy parent span
if parent_span is not None:
parent_span.set_status(Status(StatusCode.OK))
self.set_attributes(parent_span, kwargs, response_obj)
parent_span.end(end_time=self._to_ns(end_time))
# Metrics & cost recording
self._record_metrics(kwargs, response_obj, start_time, end_time)
# Semantic logs
if self.config.enable_events:
self._emit_semantic_logs(kwargs, response_obj, span)
def _handle_failure(self, kwargs, response_obj, start_time, end_time):
"""
Override to always create failure spans on ArizePhoenixLogger's dedicated
TracerProvider. Mirrors ``_handle_success`` but sets ERROR status.
"""
self._handle_phoenix_trace(
kwargs, response_obj, start_time, end_time, success=False
)
def _handle_phoenix_trace(
self,
kwargs,
response_obj,
start_time,
end_time,
*,
success: bool,
):
from opentelemetry.trace import Status, StatusCode
verbose_logger.debug(
"ArizePhoenixLogger: Failure - Logging kwargs: %s, OTEL config settings=%s",
"ArizePhoenixLogger: %s - kwargs: %s, OTEL config settings=%s",
"success" if success else "failure",
kwargs,
self.config,
)
ctx, parent_span = self._get_phoenix_context(kwargs)
_project_name, tracer = self._resolve_tracer_for_kwargs(kwargs)
ctx, parent_span = self._get_phoenix_context(kwargs, tracer=tracer)
# Create litellm_request span (child of our parent when in proxy mode)
span = self.tracer.start_span(
status = Status(StatusCode.OK if success else StatusCode.ERROR)
span = tracer.start_span(
name=self._get_span_name(kwargs),
start_time=self._to_ns(start_time),
context=ctx,
)
span.set_status(Status(StatusCode.ERROR))
span.set_status(status)
self.set_attributes(span, kwargs, response_obj)
self._record_exception_on_span(span=span, kwargs=kwargs)
if not success:
self._record_exception_on_span(span=span, kwargs=kwargs)
if success:
self._maybe_log_raw_request(
kwargs, response_obj, start_time, end_time, span
)
span.end(end_time=self._to_ns(end_time))
# Guardrail span
self._create_guardrail_span(kwargs=kwargs, context=ctx)
# Annotate and close our proxy parent span
if parent_span is not None:
parent_span.set_status(Status(StatusCode.ERROR))
parent_span.set_status(status)
self.set_attributes(parent_span, kwargs, response_obj)
self._record_exception_on_span(span=parent_span, kwargs=kwargs)
if not success:
self._record_exception_on_span(span=parent_span, kwargs=kwargs)
parent_span.end(end_time=self._to_ns(end_time))
if success:
self._record_metrics(kwargs, response_obj, start_time, end_time)
if self.config.enable_events:
self._emit_semantic_logs(kwargs, response_obj, span)
@staticmethod
def get_arize_phoenix_config() -> ArizePhoenixConfig:
"""
Retrieves the Arize Phoenix configuration based on environment variables.
Returns:
ArizePhoenixConfig: A Pydantic model containing Arize Phoenix configuration.
"""
api_key = os.environ.get("PHOENIX_API_KEY", None)
@ -295,18 +441,15 @@ class ArizePhoenixLogger(OpenTelemetry): # type: ignore
protocol: Protocol = "otlp_http"
if collector_endpoint:
# Parse the endpoint to determine protocol
if collector_endpoint.startswith("grpc://") or (
":4317" in collector_endpoint and "/v1/traces" not in collector_endpoint
):
endpoint = collector_endpoint
protocol = "otlp_grpc"
else:
# Phoenix Cloud endpoints (app.phoenix.arize.com) include the space in the URL
if "app.phoenix.arize.com" in collector_endpoint:
endpoint = collector_endpoint
protocol = "otlp_http"
# For other HTTP endpoints, ensure they have the correct path
elif "/v1/traces" not in collector_endpoint:
if collector_endpoint.endswith("/v1"):
endpoint = collector_endpoint + "/traces"
@ -318,7 +461,6 @@ class ArizePhoenixLogger(OpenTelemetry): # type: ignore
endpoint = collector_endpoint
protocol = "otlp_http"
else:
# If no endpoint specified, self hosted phoenix
endpoint = "http://localhost:6006/v1/traces"
protocol = "otlp_http"
verbose_logger.debug(
@ -329,12 +471,11 @@ class ArizePhoenixLogger(OpenTelemetry): # type: ignore
if api_key is not None:
otlp_auth_headers = f"Authorization=Bearer {api_key}"
elif "app.phoenix.arize.com" in endpoint:
# Phoenix Cloud requires an API key
raise ValueError(
"PHOENIX_API_KEY must be set when using Phoenix Cloud (app.phoenix.arize.com)."
)
project_name = os.environ.get("PHOENIX_PROJECT_NAME", "default")
project_name = os.environ.get("PHOENIX_PROJECT_NAME") or "default"
return ArizePhoenixConfig(
otlp_auth_headers=otlp_auth_headers,
@ -343,8 +484,6 @@ class ArizePhoenixLogger(OpenTelemetry): # type: ignore
project_name=project_name,
)
## cannot suppress additional proxy server spans, removed previous methods.
async def async_health_check(self):
config = self.get_arize_phoenix_config()

View File

@ -144,7 +144,26 @@ class DatadogMetricsLogger(CustomBatchLogger):
}
self.log_queue.append(series_llm_latency)
# 3. Request Count / Status Code
# 3. LiteLLM Overhead Latency Metric (total - llm_api time)
hidden_params = log.get("hidden_params", {}) or {}
litellm_overhead_time_ms = hidden_params.get("litellm_overhead_time_ms")
if litellm_overhead_time_ms is not None:
overhead_tags = self._extract_tags(log) # no status_code on latency metric
series_overhead: DatadogMetricSeries = {
"metric": "litellm.overhead.latency",
"type": 3, # gauge
"points": [
{
"timestamp": timestamp,
"value": litellm_overhead_time_ms
/ 1000, # convert ms → seconds
}
],
"tags": overhead_tags,
}
self.log_queue.append(series_overhead)
# 4. Request Count / Status Code
series_count: DatadogMetricSeries = {
"metric": "litellm.llm_api.request_count",
"type": 1, # count

View File

@ -3910,31 +3910,6 @@ def _init_custom_logger_compatible_class( # noqa: PLR0915
endpoint=arize_phoenix_config.endpoint,
headers=arize_phoenix_config.otlp_auth_headers,
)
if arize_phoenix_config.project_name:
existing_attrs = os.environ.get("OTEL_RESOURCE_ATTRIBUTES", "")
# Add openinference.project.name attribute
if existing_attrs:
os.environ["OTEL_RESOURCE_ATTRIBUTES"] = (
f"{existing_attrs},openinference.project.name={arize_phoenix_config.project_name}"
)
else:
os.environ["OTEL_RESOURCE_ATTRIBUTES"] = (
f"openinference.project.name={arize_phoenix_config.project_name}"
)
# Set Phoenix project name from environment variable
phoenix_project_name = os.environ.get("PHOENIX_PROJECT_NAME", None)
if phoenix_project_name:
existing_attrs = os.environ.get("OTEL_RESOURCE_ATTRIBUTES", "")
# Add openinference.project.name attribute
if existing_attrs:
os.environ["OTEL_RESOURCE_ATTRIBUTES"] = (
f"{existing_attrs},openinference.project.name={phoenix_project_name}"
)
else:
os.environ["OTEL_RESOURCE_ATTRIBUTES"] = (
f"openinference.project.name={phoenix_project_name}"
)
# auth can be disabled on local deployments of arize phoenix
if arize_phoenix_config.otlp_auth_headers is not None:

View File

@ -30,6 +30,9 @@ _IMAGE_RESPONSE_CALL_TYPES = frozenset(
}
)
# Pre-resolved DataResidency enum values for fast membership checks
_VALID_DATA_RESIDENCIES = frozenset(r.value for r in DataResidency)
def _is_above_128k(tokens: float) -> bool:
if tokens > 128000:
@ -636,7 +639,7 @@ def _get_regional_uplift_multiplier(
if data_residency is None:
return 1.0
residency = data_residency.lower()
if residency not in {r.value for r in DataResidency}:
if residency not in _VALID_DATA_RESIDENCIES:
return 1.0
multiplier = model_info.get(f"regional_processing_uplift_multiplier_{residency}")
if multiplier is None:

View File

@ -108,10 +108,9 @@ class BaseConfig(ABC):
return type_to_response_format_param(response_format=response_format)
def is_thinking_enabled(self, non_default_params: dict) -> bool:
return (
non_default_params.get("thinking", {}).get("type") == "enabled"
or non_default_params.get("reasoning_effort") is not None
)
return (non_default_params.get("thinking") or {}).get(
"type"
) == "enabled" or non_default_params.get("reasoning_effort") is not None
def is_max_tokens_in_request(self, non_default_params: dict) -> bool:
"""

View File

@ -17919,22 +17919,9 @@
},
"github_copilot/claude-haiku-4.5": {
"litellm_provider": "github_copilot",
"max_input_tokens": 128000,
"max_output_tokens": 16000,
"max_tokens": 16000,
"mode": "chat",
"supported_endpoints": [
"/v1/chat/completions"
],
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_vision": true
},
"github_copilot/claude-opus-4.5": {
"litellm_provider": "github_copilot",
"max_input_tokens": 128000,
"max_output_tokens": 16000,
"max_tokens": 16000,
"max_input_tokens": 200000,
"max_output_tokens": 32000,
"max_tokens": 32000,
"mode": "chat",
"supported_endpoints": [
"/v1/chat/completions"
@ -17942,7 +17929,22 @@
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_vision": true,
"supports_minimal_reasoning_effort": true
"supports_reasoning": true
},
"github_copilot/claude-opus-4.5": {
"litellm_provider": "github_copilot",
"max_input_tokens": 200000,
"max_output_tokens": 32000,
"max_tokens": 32000,
"mode": "chat",
"supported_endpoints": [
"/v1/chat/completions"
],
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_vision": true,
"supports_minimal_reasoning_effort": true,
"supports_reasoning": true
},
"github_copilot/claude-opus-4.6-fast": {
"litellm_provider": "github_copilot",
@ -17957,6 +17959,22 @@
"supports_parallel_function_calling": true,
"supports_vision": true
},
"github_copilot/claude-opus-4.7": {
"litellm_provider": "github_copilot",
"max_input_tokens": 200000,
"max_output_tokens": 64000,
"max_tokens": 64000,
"mode": "chat",
"supported_endpoints": [
"/v1/chat/completions",
"/v1/messages"
],
"supports_vision": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_reasoning": true
},
"github_copilot/claude-opus-41": {
"litellm_provider": "github_copilot",
"max_input_tokens": 80000,
@ -17983,16 +18001,33 @@
},
"github_copilot/claude-sonnet-4.5": {
"litellm_provider": "github_copilot",
"max_input_tokens": 128000,
"max_output_tokens": 16000,
"max_tokens": 16000,
"max_input_tokens": 200000,
"max_output_tokens": 32000,
"max_tokens": 32000,
"mode": "chat",
"supported_endpoints": [
"/v1/chat/completions"
],
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_vision": true
"supports_vision": true,
"supports_reasoning": true
},
"github_copilot/claude-sonnet-4.6": {
"litellm_provider": "github_copilot",
"max_input_tokens": 200000,
"max_output_tokens": 32000,
"max_tokens": 32000,
"mode": "chat",
"supported_endpoints": [
"/v1/chat/completions",
"/v1/messages"
],
"supports_vision": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_reasoning": true
},
"github_copilot/gemini-2.5-pro": {
"litellm_provider": "github_copilot",
@ -18002,7 +18037,25 @@
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_vision": true
"supports_vision": true,
"supported_endpoints": [
"/v1/chat/completions"
],
"supports_reasoning": true
},
"github_copilot/gemini-3-flash-preview": {
"litellm_provider": "github_copilot",
"max_input_tokens": 128000,
"max_output_tokens": 64000,
"max_tokens": 64000,
"mode": "chat",
"supported_endpoints": [
"/v1/chat/completions"
],
"supports_vision": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_reasoning": true
},
"github_copilot/gemini-3-pro-preview": {
"litellm_provider": "github_copilot",
@ -18014,13 +18067,30 @@
"supports_parallel_function_calling": true,
"supports_vision": true
},
"github_copilot/gemini-3.1-pro-preview": {
"litellm_provider": "github_copilot",
"max_input_tokens": 128000,
"max_output_tokens": 64000,
"max_tokens": 64000,
"mode": "chat",
"supported_endpoints": [
"/v1/chat/completions"
],
"supports_vision": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_reasoning": true
},
"github_copilot/gpt-3.5-turbo": {
"litellm_provider": "github_copilot",
"max_input_tokens": 16384,
"max_output_tokens": 4096,
"max_tokens": 4096,
"mode": "chat",
"supports_function_calling": true
"supports_function_calling": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-3.5-turbo-0613": {
"litellm_provider": "github_copilot",
@ -18028,7 +18098,10 @@
"max_output_tokens": 4096,
"max_tokens": 4096,
"mode": "chat",
"supports_function_calling": true
"supports_function_calling": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-4": {
"litellm_provider": "github_copilot",
@ -18036,7 +18109,22 @@
"max_output_tokens": 4096,
"max_tokens": 4096,
"mode": "chat",
"supports_function_calling": true
"supports_function_calling": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-4-0125-preview": {
"litellm_provider": "github_copilot",
"max_input_tokens": 128000,
"max_output_tokens": 4096,
"max_tokens": 4096,
"mode": "chat",
"supported_endpoints": [
"/v1/chat/completions"
],
"supports_function_calling": true,
"supports_parallel_function_calling": true
},
"github_copilot/gpt-4-0613": {
"litellm_provider": "github_copilot",
@ -18044,16 +18132,22 @@
"max_output_tokens": 4096,
"max_tokens": 4096,
"mode": "chat",
"supports_function_calling": true
"supports_function_calling": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-4-o-preview": {
"litellm_provider": "github_copilot",
"max_input_tokens": 64000,
"max_input_tokens": 128000,
"max_output_tokens": 4096,
"max_tokens": 4096,
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": true
"supports_parallel_function_calling": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-4.1": {
"litellm_provider": "github_copilot",
@ -18064,7 +18158,10 @@
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_vision": true
"supports_vision": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-4.1-2025-04-14": {
"litellm_provider": "github_copilot",
@ -18075,68 +18172,89 @@
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_vision": true
"supports_vision": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-41-copilot": {
"litellm_provider": "github_copilot",
"mode": "completion"
"mode": "chat"
},
"github_copilot/gpt-4o": {
"litellm_provider": "github_copilot",
"max_input_tokens": 64000,
"max_input_tokens": 128000,
"max_output_tokens": 4096,
"max_tokens": 4096,
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_vision": true
"supports_vision": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-4o-2024-05-13": {
"litellm_provider": "github_copilot",
"max_input_tokens": 64000,
"max_input_tokens": 128000,
"max_output_tokens": 4096,
"max_tokens": 4096,
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_vision": true
"supports_vision": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-4o-2024-08-06": {
"litellm_provider": "github_copilot",
"max_input_tokens": 64000,
"max_output_tokens": 16384,
"max_tokens": 16384,
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": true
},
"github_copilot/gpt-4o-2024-11-20": {
"litellm_provider": "github_copilot",
"max_input_tokens": 64000,
"max_input_tokens": 128000,
"max_output_tokens": 16384,
"max_tokens": 16384,
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_vision": true
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-4o-2024-11-20": {
"litellm_provider": "github_copilot",
"max_input_tokens": 128000,
"max_output_tokens": 16384,
"max_tokens": 16384,
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_vision": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-4o-mini": {
"litellm_provider": "github_copilot",
"max_input_tokens": 64000,
"max_input_tokens": 128000,
"max_output_tokens": 4096,
"max_tokens": 4096,
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": true
"supports_parallel_function_calling": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-4o-mini-2024-07-18": {
"litellm_provider": "github_copilot",
"max_input_tokens": 64000,
"max_input_tokens": 128000,
"max_output_tokens": 4096,
"max_tokens": 4096,
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": true
"supports_parallel_function_calling": true,
"supported_endpoints": [
"/v1/chat/completions"
]
},
"github_copilot/gpt-5": {
"litellm_provider": "github_copilot",
@ -18155,14 +18273,19 @@
},
"github_copilot/gpt-5-mini": {
"litellm_provider": "github_copilot",
"max_input_tokens": 128000,
"max_input_tokens": 264000,
"max_output_tokens": 64000,
"max_tokens": 64000,
"mode": "chat",
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_vision": true
"supports_vision": true,
"supported_endpoints": [
"/v1/chat/completions",
"/v1/responses"
],
"supports_reasoning": true
},
"github_copilot/gpt-5.1": {
"litellm_provider": "github_copilot",
@ -18195,7 +18318,7 @@
},
"github_copilot/gpt-5.2": {
"litellm_provider": "github_copilot",
"max_input_tokens": 128000,
"max_input_tokens": 264000,
"max_output_tokens": 64000,
"max_tokens": 64000,
"mode": "chat",
@ -18206,11 +18329,27 @@
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_vision": true
"supports_vision": true,
"supports_reasoning": true
},
"github_copilot/gpt-5.2-codex": {
"litellm_provider": "github_copilot",
"max_input_tokens": 400000,
"max_output_tokens": 128000,
"max_tokens": 128000,
"mode": "responses",
"supported_endpoints": [
"/v1/responses"
],
"supports_vision": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_reasoning": true
},
"github_copilot/gpt-5.3-codex": {
"litellm_provider": "github_copilot",
"max_input_tokens": 128000,
"max_input_tokens": 400000,
"max_output_tokens": 128000,
"max_tokens": 128000,
"mode": "responses",
@ -18220,25 +18359,96 @@
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_vision": true
"supports_vision": true,
"supports_reasoning": true
},
"github_copilot/gpt-5.4": {
"litellm_provider": "github_copilot",
"max_input_tokens": 400000,
"max_output_tokens": 128000,
"max_tokens": 128000,
"mode": "chat",
"supported_endpoints": [
"/v1/chat/completions",
"/v1/responses"
],
"supports_vision": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_reasoning": true
},
"github_copilot/gpt-5.4-mini": {
"litellm_provider": "github_copilot",
"max_input_tokens": 400000,
"max_output_tokens": 128000,
"max_tokens": 128000,
"mode": "responses",
"supported_endpoints": [
"/v1/responses"
],
"supports_vision": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_reasoning": true
},
"github_copilot/gpt-5.5": {
"litellm_provider": "github_copilot",
"max_input_tokens": 400000,
"max_output_tokens": 128000,
"max_tokens": 128000,
"mode": "responses",
"supported_endpoints": [
"/v1/responses"
],
"supports_vision": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true,
"supports_reasoning": true
},
"github_copilot/oswe-vscode-prime": {
"litellm_provider": "github_copilot",
"max_input_tokens": 264000,
"max_output_tokens": 64000,
"max_tokens": 64000,
"mode": "chat",
"supported_endpoints": [
"/v1/chat/completions",
"/v1/responses"
],
"supports_vision": true,
"supports_function_calling": true,
"supports_parallel_function_calling": true,
"supports_response_schema": true
},
"github_copilot/text-embedding-3-small": {
"litellm_provider": "github_copilot",
"max_input_tokens": 8191,
"max_tokens": 8191,
"mode": "embedding"
"mode": "embedding",
"supported_endpoints": [
"/v1/embeddings"
]
},
"github_copilot/text-embedding-3-small-inference": {
"litellm_provider": "github_copilot",
"max_input_tokens": 8191,
"max_tokens": 8191,
"mode": "embedding"
"mode": "embedding",
"supported_endpoints": [
"/v1/embeddings"
]
},
"github_copilot/text-embedding-ada-002": {
"litellm_provider": "github_copilot",
"max_input_tokens": 8191,
"max_tokens": 8191,
"mode": "embedding"
"mode": "embedding",
"supported_endpoints": [
"/v1/embeddings"
]
},
"chatgpt/gpt-5.4": {
"litellm_provider": "chatgpt",

View File

@ -1,3 +1,4 @@
import html as _html
import json
from typing import Any, Dict, Optional
from urllib.parse import parse_qsl, urlencode, urlparse, urlunparse
@ -618,8 +619,105 @@ async def token_endpoint(
)
# Per RFC 6749 §4.1.2.1, an IdP that rejects an OAuth authorization request
# redirects back to the configured redirect URI with ``error`` /
# ``error_description`` / ``error_uri`` query params and no ``code``. The MCP
# loopback flow funnels that response through this /callback endpoint, so
# the endpoint must accept either a successful (``code``+``state``) or an
# error response. Declaring ``code``/``state`` as required would cause
# FastAPI to reject the error response with a 422 before the handler runs,
# which strands the MCP client waiting on the loopback (see LIT-2750).
def _render_oauth_error_html(error: str, description: Optional[str]) -> HTMLResponse:
"""Render an actionable HTML page for an IdP-reported OAuth error.
Used when we cannot propagate the error back to the registered
``redirect_uri`` (state missing or undecryptable). Returned with a 400
status so the failure is observable to operators while still being a
human-readable page for the end user.
"""
safe_error = _html.escape(error or "unknown_error")
safe_description = _html.escape(description) if description else ""
description_html = f"<p>{safe_description}</p>" if safe_description else ""
body = (
"<html><body>"
"<h2>Authentication failed</h2>"
f"<p><strong>Error:</strong> {safe_error}</p>"
f"{description_html}"
"<p>You can close this window and try again.</p>"
"</body></html>"
)
return HTMLResponse(body, status_code=400)
@router.get("/callback")
async def callback(request: Request, code: str, state: str):
async def callback(
request: Request,
code: Optional[str] = None,
state: Optional[str] = None,
error: Optional[str] = None,
error_description: Optional[str] = None,
error_uri: Optional[str] = None,
):
"""OAuth 2.0 authorization response handler for MCP loopback clients.
Accepts either:
- A successful authorization response (``code`` + ``state``), which is
forwarded back to the validated client ``redirect_uri`` with the
original (un-wrapped) ``state``.
- An error response (``error``[+``error_description``/``error_uri``]), per
RFC 6749 §4.1.2.1. When ``state`` is present and decodes to a trusted
``redirect_uri``, the error params are propagated back to the client so
its OAuth library can surface them. Otherwise we render an HTML error
page so the user is not left on an opaque 422 / blank screen.
"""
# 1. IdP-reported error path (e.g. ``?error=access_denied``).
if error:
verbose_logger.info(
"MCP /callback received IdP error: error=%s, error_description=%s",
error,
error_description,
)
if state:
try:
state_data = decode_state_hash(state)
original_state = state_data.get("original_state")
redirect_uri = _get_validated_client_redirect_uri(request, state_data)
except HTTPException:
# Untrusted/invalid client redirect_uri — surface inline rather
# than blindly forwarding the error to an attacker-controlled URL.
return _render_oauth_error_html(error, error_description)
except Exception:
# State could not be decrypted (expired key, tampered, etc.).
return _render_oauth_error_html(error, error_description)
params: Dict[str, str] = {"error": error}
if error_description:
params["error_description"] = error_description
if error_uri:
params["error_uri"] = error_uri
if original_state is not None:
params["state"] = original_state
complete_returned_url = _append_query_params(redirect_uri, params)
return RedirectResponse(url=complete_returned_url, status_code=302)
# No state — nothing to round-trip to. Show the user the error.
return _render_oauth_error_html(error, error_description)
# 2. Neither success nor error parameters present — most likely a stray
# GET / dropped SSO redirect chain. Surface a 400 instead of 422.
if not code or not state:
missing = [
name for name, value in (("code", code), ("state", state)) if not value
]
return _render_oauth_error_html(
"invalid_request",
f"Missing authorization {' and '.join(repr(m) for m in missing)} parameter(s).",
)
# 3. Successful authorization response.
try:
state_data = decode_state_hash(state)
original_state = state_data["original_state"]

View File

@ -213,6 +213,12 @@ _EXTRA_BANNED_OBSERVABILITY_PARAMS: FrozenSet[str] = frozenset(
{
"posthog_api_url",
"phoenix_project_name",
"phoenix_project_name_override",
# Server-reserved: written exclusively by add_user_api_key_auth_to_request_metadata
# from the authenticated key's database record. A caller-supplied value
# would survive the server merge and let an authenticated user redirect
# their Arize/Phoenix telemetry into arbitrary projects.
"user_api_key_auth_metadata",
"wandb_api_key",
"weave_project_id",
}

View File

@ -23,11 +23,11 @@ model_list:
model: bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0
#########################################################
########## batch specific params ########################
s3_bucket_name: litellm-proxy-941277531214
s3_bucket_name: litellm-proxy-123456789012
s3_region_name: us-west-2
s3_access_key_id: os.environ/AWS_ACCESS_KEY_ID
s3_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_batch_role_arn: arn:aws:iam::941277531214:role/service-role/AmazonBedrockExecutionRoleForAgents_BB9HNW6V4CV
aws_batch_role_arn: arn:aws:iam::123456789012:role/service-role/AmazonBedrockExecutionRoleForAgents_EXAMPLE
model_info:
mode: batch

View File

@ -471,10 +471,32 @@ async def _emit_management_endpoint_otel_span(
route = func.__name__
request_body = {}
_CREDENTIAL_FIELDS = frozenset(
{
"key",
"token",
"api_key",
"secret",
"password",
"access_token",
"refresh_token",
"private_key",
"service_account_key",
}
)
_response: Optional[dict] = None
if exception is None and result is not None:
try:
raw = dict(result)
_response = {k: v for k, v in raw.items() if k not in _CREDENTIAL_FIELDS}
except Exception:
_response = None
logging_payload = ManagementEndpointLoggingPayload(
route=route,
request_data=request_body,
response=None,
response=_response,
start_time=start_time,
end_time=end_time,
exception=exception,

View File

@ -908,7 +908,7 @@ async def pass_through_request( # noqa: PLR0915
else {"json": _parsed_body}
)
req = async_client.build_request(
"POST",
request.method,
url,
params=requested_query_params,
headers=headers,

View File

@ -7,7 +7,6 @@ from litellm.integrations.arize.arize_phoenix import (
ArizePhoenixConfig,
ArizePhoenixLogger,
)
from litellm.integrations.arize._utils import ArizeOTELAttributes
class TestArizePhoenixConfig(unittest.TestCase):
@ -217,44 +216,147 @@ def test_get_arize_phoenix_config_expection_on_missing_api_key(monkeypatch, env_
# ---------------------------------------------------------------------------
# Dynamic project naming from metadata
# Per-project routing via Resource (not span attributes)
# ---------------------------------------------------------------------------
class TestGetDynamicProjectName:
"""Tests for _get_dynamic_project_name extraction logic."""
class TestResolveProjectName:
"""Tests for _resolve_project_name priority chain."""
def test_extracts_from_standard_logging_object_metadata(self):
def test_extracts_phoenix_name_from_standard_logging_object_metadata(self):
kwargs = {
"standard_logging_object": {
"metadata": {"phoenix_project_name": "my-project"},
}
}
assert ArizePhoenixLogger._get_dynamic_project_name(kwargs) == "my-project"
assert ArizePhoenixLogger._resolve_project_name(kwargs) == "my-project"
def test_extracts_from_litellm_params_metadata(self):
def test_extracts_phoenix_name_from_litellm_params_metadata(self):
kwargs = {
"litellm_params": {
"metadata": {"phoenix_project_name": "sdk-project"},
}
}
assert ArizePhoenixLogger._get_dynamic_project_name(kwargs) == "sdk-project"
assert ArizePhoenixLogger._resolve_project_name(kwargs) == "sdk-project"
def test_returns_none_when_no_metadata(self):
assert ArizePhoenixLogger._get_dynamic_project_name({}) is None
@patch.dict("os.environ", {"PHOENIX_PROJECT_NAME": "env-project"}, clear=False)
def test_falls_back_to_phoenix_env_when_no_metadata(self):
assert ArizePhoenixLogger._resolve_project_name({}) == "env-project"
@patch.dict(
"os.environ",
{"ARIZE_PROJECT_NAME": "arize-env", "PHOENIX_PROJECT_NAME": ""},
clear=False,
)
def test_falls_back_to_arize_env_when_phoenix_unset(self):
assert ArizePhoenixLogger._resolve_project_name({}) == "arize-env"
@patch.dict("os.environ", {}, clear=True)
def test_falls_back_to_default_when_no_metadata_or_env(self):
assert ArizePhoenixLogger._resolve_project_name({}) == "default"
def test_phoenix_override_beats_phoenix_metadata(self):
kwargs = {
"standard_logging_object": {
"metadata": {
"phoenix_project_name_override": "override-proj",
"phoenix_project_name": "phoenix-proj",
},
}
}
assert ArizePhoenixLogger._resolve_project_name(kwargs) == "override-proj"
def test_whitespace_only_metadata_falls_through_to_default(self):
kwargs = {
"standard_logging_object": {
"metadata": {"phoenix_project_name_override": " "},
}
}
with patch.dict("os.environ", {}, clear=True):
assert ArizePhoenixLogger._resolve_project_name(kwargs) == "default"
def test_strips_whitespace_from_project_name(self):
kwargs = {
"standard_logging_object": {
"metadata": {"phoenix_project_name": " trimmed "},
}
}
assert ArizePhoenixLogger._resolve_project_name(kwargs) == "trimmed"
def test_non_dict_standard_logging_object_does_not_raise(self):
"""isinstance(dict) guard prevents AttributeError on non-dict payloads."""
kwargs = {"standard_logging_object": "not-a-dict"}
assert ArizePhoenixLogger._get_dynamic_project_name(kwargs) is None
with patch.dict("os.environ", {}, clear=True):
assert ArizePhoenixLogger._resolve_project_name(kwargs) == "default"
def test_resolves_override_from_user_api_key_auth_metadata(self):
kwargs = {
"litellm_params": {
"metadata": {
"user_api_key_auth_metadata": {
"phoenix_project_name_override": "claude-code",
},
},
},
}
with patch.dict("os.environ", {}, clear=True):
assert ArizePhoenixLogger._resolve_project_name(kwargs) == "claude-code"
def test_resolves_phoenix_name_from_user_api_key_auth_metadata(self):
kwargs = {
"standard_logging_object": {
"metadata": {
"user_api_key_auth_metadata": {
"phoenix_project_name": "team-project",
},
},
},
}
with patch.dict("os.environ", {}, clear=True):
assert ArizePhoenixLogger._resolve_project_name(kwargs) == "team-project"
def test_proxy_ignores_client_metadata_when_auth_metadata_set(self):
kwargs = {
"litellm_params": {
"proxy_server_request": {
"url": "/v1/chat/completions",
"method": "POST",
"headers": {},
},
"metadata": {
"phoenix_project_name_override": "attacker-project",
"user_api_key_auth_metadata": {
"phoenix_project_name_override": "team-project",
},
},
},
}
with patch.dict("os.environ", {}, clear=True):
assert ArizePhoenixLogger._resolve_project_name(kwargs) == "team-project"
def test_proxy_without_auth_metadata_falls_back_to_env(self):
kwargs = {
"litellm_params": {
"proxy_server_request": {
"url": "/v1/chat/completions",
"method": "POST",
"headers": {},
},
"metadata": {"phoenix_project_name": "attacker-project"},
},
}
with patch.dict(
"os.environ", {"PHOENIX_PROJECT_NAME": "env-project"}, clear=True
):
assert ArizePhoenixLogger._resolve_project_name(kwargs) == "env-project"
class TestDynamicProjectNameOnSpan:
"""set_arize_phoenix_attributes sets openinference.project.name on the span."""
class TestProjectNameNotOnSpan:
"""Project routing uses Resource on TracerProvider, not span attributes."""
@patch.dict("os.environ", {"PHOENIX_PROJECT_NAME": "env-fallback"}, clear=False)
@patch("litellm.integrations.arize._utils.set_attributes")
def test_dynamic_name_sets_span_attribute(self, _mock_set_attrs):
def test_set_arize_phoenix_attributes_does_not_set_project_on_span(
self, _mock_set_attrs
):
span = MagicMock()
kwargs = {
"standard_logging_object": {
@ -263,20 +365,468 @@ class TestDynamicProjectNameOnSpan:
}
ArizePhoenixLogger.set_arize_phoenix_attributes(span, kwargs, response_obj=None)
span.set_attribute.assert_called_once_with(
"openinference.project.name", "dynamic-proj"
for call in span.set_attribute.call_args_list:
assert call[0][0] != "openinference.project.name"
class TestPerProjectTracerProviderCache:
"""Spans for different projects use different Resources on export."""
def test_different_metadata_routes_to_different_resource(self):
from datetime import datetime
from opentelemetry.sdk.trace.export.in_memory_span_exporter import (
InMemorySpanExporter,
)
@patch.dict("os.environ", {"PHOENIX_PROJECT_NAME": "env-project"}, clear=False)
@patch("litellm.integrations.arize._utils.set_attributes")
def test_falls_back_to_env_var_when_no_dynamic_name(self, _mock_set_attrs):
span = MagicMock()
ArizePhoenixLogger.set_arize_phoenix_attributes(span, {}, response_obj=None)
from litellm.integrations.opentelemetry import OpenTelemetryConfig
span.set_attribute.assert_called_once_with(
"openinference.project.name", "env-project"
exporter = InMemorySpanExporter()
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=exporter),
callback_name="arize_phoenix",
)
start = datetime(2024, 1, 1, 12, 0, 0)
end = datetime(2024, 1, 1, 12, 0, 1)
logger._handle_success(
{
"standard_logging_object": {
"metadata": {"phoenix_project_name": "project-a"},
},
},
response_obj={},
start_time=start,
end_time=end,
)
logger._handle_success(
{
"standard_logging_object": {
"metadata": {"phoenix_project_name": "project-b"},
},
},
response_obj={},
start_time=start,
end_time=end,
)
spans = exporter.get_finished_spans()
project_names = {
s.resource.attributes.get("openinference.project.name") for s in spans
}
assert "project-a" in project_names
assert "project-b" in project_names
def test_shared_span_processor_created_once_at_init(self):
from litellm.integrations.opentelemetry import (
OpenTelemetry,
OpenTelemetryConfig,
)
mock_processor = MagicMock()
with patch.object(
OpenTelemetry, "_get_span_processor", return_value=mock_processor
) as mock_get_processor:
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=MagicMock()),
callback_name="arize_phoenix",
)
assert mock_get_processor.call_count == 1
assert logger._shared_span_processor is mock_processor
logger._project_providers.clear()
logger._get_tracer_for("project-a")
logger._get_tracer_for("project-b")
assert mock_get_processor.call_count == 1
def test_lru_eviction_does_not_shutdown_provider(self):
from litellm.integrations.opentelemetry import OpenTelemetryConfig
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=MagicMock()),
callback_name="arize_phoenix",
)
logger._project_providers.clear()
logger._get_tracer_for("project-0")
evicted_provider = logger._project_providers["project-0"]
shutdown_mock = MagicMock()
evicted_provider.shutdown = shutdown_mock # type: ignore[method-assign]
for i in range(1, 65):
logger._get_tracer_for(f"project-{i}")
assert len(logger._project_providers) == 64
assert "project-0" not in logger._project_providers
assert "project-64" in logger._project_providers
shutdown_mock.assert_not_called()
def test_flush_tracer_providers_force_flushes_shared_processor(self):
from litellm.integrations.opentelemetry import OpenTelemetryConfig
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=MagicMock()),
callback_name="arize_phoenix",
)
mock_processor = MagicMock()
logger._shared_span_processor = mock_processor
mock_provider = MagicMock()
logger._project_providers["proj"] = mock_provider
logger.flush_tracer_providers()
mock_processor.force_flush.assert_called_once()
mock_provider.force_flush.assert_called_once()
class TestGetLitellmResourceForProject:
"""Resource attrs used by Phoenix OSS and Arize AX for project routing."""
def test_project_attrs_win_over_otel_resource_attributes_env(self):
from litellm.integrations.opentelemetry import OpenTelemetryConfig
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=MagicMock()),
callback_name="arize_phoenix",
)
with patch.dict(
"os.environ",
{
"OTEL_RESOURCE_ATTRIBUTES": "openinference.project.name=env-pinned,model_id=env-model"
},
clear=False,
):
resource = logger._get_litellm_resource_for_project("dynamic-proj")
assert resource.attributes["openinference.project.name"] == "dynamic-proj"
assert resource.attributes["model_id"] == "dynamic-proj"
assert resource.attributes["service.name"] == "dynamic-proj"
@patch.dict("os.environ", {"OTEL_DEPLOYMENT_ENVIRONMENT": "staging"}, clear=False)
def test_preserves_deployment_environment_from_config(self):
from litellm.integrations.opentelemetry import OpenTelemetryConfig
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(
exporter=MagicMock(), deployment_environment="staging"
),
callback_name="arize_phoenix",
)
resource = logger._get_litellm_resource_for_project("my-proj")
assert resource.attributes.get("deployment.environment") == "staging"
class TestTracerResolutionAndCache:
"""_resolve_tracer_for_kwargs, get_tracer_to_use_for_request, provider cache."""
def test_get_tracer_to_use_for_request_matches_resolve_tracer(self):
from litellm.integrations.opentelemetry import OpenTelemetryConfig
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=MagicMock()),
callback_name="arize_phoenix",
)
kwargs = {
"standard_logging_object": {
"metadata": {"phoenix_project_name": "same-proj"},
}
}
project_name, _ = logger._resolve_tracer_for_kwargs(kwargs)
tracer_from_request = logger.get_tracer_to_use_for_request(kwargs)
assert project_name == "same-proj"
assert "same-proj" in logger._project_providers
assert logger._resolve_project_name(kwargs) == project_name
assert tracer_from_request is not None
def test_cache_reuses_provider_for_same_project(self):
from litellm.integrations.opentelemetry import OpenTelemetryConfig
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=MagicMock()),
callback_name="arize_phoenix",
)
logger._project_providers.clear()
logger._get_tracer_for("cached-proj")
provider_first = logger._project_providers["cached-proj"]
logger._get_tracer_for("cached-proj")
provider_second = logger._project_providers["cached-proj"]
assert provider_first is provider_second
assert len(logger._project_providers) == 1
def test_parallel_cache_miss_for_same_project_inserts_once(self):
import threading
from litellm.integrations.opentelemetry import OpenTelemetryConfig
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=MagicMock()),
callback_name="arize_phoenix",
)
logger._project_providers.clear()
build_calls: list[str] = []
real_build = logger._build_tracer_provider_for_project
def tracking_build(project_name: str):
build_calls.append(project_name)
return real_build(project_name)
barrier = threading.Barrier(10)
errors: list[Exception] = []
def worker() -> None:
try:
barrier.wait()
logger._get_tracer_for("race-proj")
except Exception as exc:
errors.append(exc)
with patch.object(
logger,
"_build_tracer_provider_for_project",
side_effect=tracking_build,
):
threads = [threading.Thread(target=worker) for _ in range(10)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
assert not errors
assert len(logger._project_providers) == 1
assert "race-proj" in logger._project_providers
assert len(build_calls) >= 1
def test_injected_tracer_provider_bypasses_project_cache(self):
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import (
InMemorySpanExporter,
)
from litellm.integrations.opentelemetry import OpenTelemetryConfig
exporter = InMemorySpanExporter()
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(exporter))
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=exporter),
callback_name="arize_phoenix",
tracer_provider=provider,
)
assert getattr(logger, "_use_injected_tracer_provider", False) is True
assert not hasattr(logger, "_project_providers") or not getattr(
logger, "_project_providers", None
)
tracer_a = logger._get_tracer_for("any-project")
tracer_b = logger.get_tracer_to_use_for_request(
{"standard_logging_object": {"metadata": {"phoenix_project_name": "x"}}}
)
assert tracer_a is logger.tracer
assert tracer_b is logger.tracer
def test_flush_tracer_providers_noop_for_injected_provider(self):
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import (
InMemorySpanExporter,
)
from litellm.integrations.opentelemetry import OpenTelemetryConfig
exporter = InMemorySpanExporter()
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(exporter))
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=exporter),
callback_name="arize_phoenix",
tracer_provider=provider,
)
logger.flush_tracer_providers()
exporter.shutdown()
def test_standard_logging_metadata_wins_over_litellm_params(self):
kwargs = {
"standard_logging_object": {
"metadata": {"phoenix_project_name_override": "from-logging"},
},
"litellm_params": {
"metadata": {"phoenix_project_name_override": "from-params"},
},
}
assert ArizePhoenixLogger._resolve_project_name(kwargs) == "from-logging"
class TestPhoenixTraceHandling:
"""_handle_success / _handle_failure span export behavior."""
def test_handle_failure_sets_error_status_on_request_span(self):
from datetime import datetime
from opentelemetry.sdk.trace.export.in_memory_span_exporter import (
InMemorySpanExporter,
)
from opentelemetry.trace import StatusCode
from litellm.integrations.opentelemetry import (
LITELLM_REQUEST_SPAN_NAME,
OpenTelemetryConfig,
)
exporter = InMemorySpanExporter()
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=exporter),
callback_name="arize_phoenix",
)
start = datetime(2024, 1, 1, 12, 0, 0)
end = datetime(2024, 1, 1, 12, 0, 1)
logger._handle_failure(
{
"standard_logging_object": {
"metadata": {"phoenix_project_name": "fail-proj"},
},
"exception": Exception("boom"),
},
response_obj=None,
start_time=start,
end_time=end,
)
spans = exporter.get_finished_spans()
request_spans = [s for s in spans if s.name == LITELLM_REQUEST_SPAN_NAME]
assert len(request_spans) == 1
assert request_spans[0].status.status_code == StatusCode.ERROR
assert (
request_spans[0].resource.attributes.get("openinference.project.name")
== "fail-proj"
)
def test_proxy_mode_parent_and_child_share_trace_id(self):
from datetime import datetime
from opentelemetry.sdk.trace.export.in_memory_span_exporter import (
InMemorySpanExporter,
)
from litellm.integrations.opentelemetry import (
LITELLM_REQUEST_SPAN_NAME,
OpenTelemetryConfig,
)
exporter = InMemorySpanExporter()
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=exporter),
callback_name="arize_phoenix",
)
start = datetime(2024, 1, 1, 12, 0, 0)
end = datetime(2024, 1, 1, 12, 0, 1)
logger._handle_success(
{
"litellm_params": {
"proxy_server_request": {
"url": "/chat/completions",
"method": "POST",
"headers": {},
},
"metadata": {
"user_api_key_auth_metadata": {
"phoenix_project_name_override": "proxy-proj",
},
},
},
},
response_obj={},
start_time=start,
end_time=end,
)
spans = exporter.get_finished_spans()
span_names = {s.name for s in spans}
assert "litellm_proxy_request" in span_names
assert LITELLM_REQUEST_SPAN_NAME in span_names
trace_ids = {s.context.trace_id for s in spans}
assert len(trace_ids) == 1
for span in spans:
assert (
span.resource.attributes.get("openinference.project.name")
== "proxy-proj"
)
def test_override_routes_all_spans_to_one_project_in_single_request(self):
from datetime import datetime
from opentelemetry.sdk.trace.export.in_memory_span_exporter import (
InMemorySpanExporter,
)
from litellm.integrations.opentelemetry import OpenTelemetryConfig
exporter = InMemorySpanExporter()
logger = ArizePhoenixLogger(
config=OpenTelemetryConfig(exporter=exporter),
callback_name="arize_phoenix",
)
start = datetime(2024, 1, 1, 12, 0, 0)
end = datetime(2024, 1, 1, 12, 0, 1)
logger._handle_success(
{
"standard_logging_object": {
"metadata": {
"user_api_key_auth_metadata": {
"phoenix_project_name_override": "unified-proj",
},
},
},
"litellm_params": {
"proxy_server_request": {
"url": "/v1/chat/completions",
"method": "POST",
"headers": {},
},
},
},
response_obj={"id": "resp-1"},
start_time=start,
end_time=end,
)
for span in exporter.get_finished_spans():
assert (
span.resource.attributes.get("openinference.project.name")
== "unified-proj"
)
assert span.resource.attributes.get("model_id") == "unified-proj"
class TestGetArizePhoenixConfigProjectName:
@patch.dict(
"os.environ", {"PHOENIX_PROJECT_NAME": "phoenix-config-proj"}, clear=True
)
def test_project_name_from_phoenix_env(self):
config = ArizePhoenixLogger.get_arize_phoenix_config()
assert config.project_name == "phoenix-config-proj"
@patch.dict("os.environ", {}, clear=True)
def test_project_name_defaults_when_env_unset(self):
config = ArizePhoenixLogger.get_arize_phoenix_config()
assert config.project_name == "default"
if __name__ == "__main__":
unittest.main()

View File

@ -104,6 +104,7 @@ async def test_add_metrics_from_log(clean_env):
logger._add_metrics_from_log(log=payload, kwargs=kwargs, status_code="200")
# Should have 3 series: total_latency, llm_api_latency, request_count
# (no overhead metric because payload has no hidden_params litellm_overhead_time_ms)
assert len(logger.log_queue) == 3
metrics = {s["metric"]: s for s in logger.log_queue}
@ -125,6 +126,72 @@ async def test_add_metrics_from_log(clean_env):
assert "status_code:200" in count["tags"]
@pytest.mark.asyncio
async def test_overhead_latency_metric_emitted(clean_env):
"""Test that litellm.overhead.latency is emitted when hidden_params contains litellm_overhead_time_ms."""
logger = DatadogMetricsLogger(batch_size=100, start_periodic_flush=False)
now = datetime.now()
start_time = now - timedelta(seconds=2)
api_call_start_time = now - timedelta(seconds=1)
payload = StandardLoggingPayload(
custom_llm_provider="openai",
model="gpt-4o",
hidden_params={
"litellm_overhead_time_ms": 250.0, # 250 ms of overhead
},
)
kwargs = {
"start_time": start_time,
"api_call_start_time": api_call_start_time,
"end_time": now,
}
logger._add_metrics_from_log(log=payload, kwargs=kwargs, status_code="200")
metrics = {s["metric"]: s for s in logger.log_queue}
# Overhead metric must be present
assert (
"litellm.overhead.latency" in metrics
), f"Expected 'litellm.overhead.latency' in emitted metrics, got: {list(metrics.keys())}"
overhead = metrics["litellm.overhead.latency"]
assert overhead["type"] == 3 # gauge
# 250 ms → 0.25 s
assert abs(overhead["points"][0]["value"] - 0.25) < 1e-6
# status_code should NOT be in overhead tags (it is a latency metric, not a request count)
assert not any(tag.startswith("status_code:") for tag in overhead["tags"])
@pytest.mark.asyncio
async def test_overhead_latency_metric_absent_when_no_hidden_params(clean_env):
"""Test that litellm.overhead.latency is NOT emitted when hidden_params has no overhead value."""
logger = DatadogMetricsLogger(batch_size=100, start_periodic_flush=False)
now = datetime.now()
start_time = now - timedelta(seconds=2)
api_call_start_time = now - timedelta(seconds=1)
payload = StandardLoggingPayload(
custom_llm_provider="openai",
model="gpt-4o",
# No hidden_params / no litellm_overhead_time_ms
)
kwargs = {
"start_time": start_time,
"api_call_start_time": api_call_start_time,
"end_time": now,
}
logger._add_metrics_from_log(log=payload, kwargs=kwargs, status_code="200")
metrics = {s["metric"]: s for s in logger.log_queue}
assert "litellm.overhead.latency" not in metrics
@pytest.mark.asyncio
async def test_async_log_success_event(clean_env):
"""Test that success events are added to the queue."""

View File

@ -0,0 +1,210 @@
"""Regression tests for LIT-2750.
The MCP OAuth ``/callback`` endpoint must handle IdP error responses
(e.g. ``?error=access_denied``) gracefully instead of returning a 422
because ``code`` and ``state`` were declared as required FastAPI query
params. Per RFC 6749 §4.1.2.1 the IdP redirects to the configured
redirect URI with ``error`` / ``error_description`` / ``error_uri``
query params and no ``code`` when the user denies access.
These tests cover both the propagate-to-client path (when state decodes
to a trusted ``redirect_uri``) and the in-page fallback (when state is
missing, undecryptable, or carries an untrusted redirect_uri). They also
pin the success path (``code`` + ``state``) against accidental
regressions.
"""
import pytest
@pytest.fixture(autouse=True)
def _mock_mcp_client_ip():
"""Bypass IP-based access control for the in-process TestClient.
Mirrors the autouse fixture in ``test_discoverable_endpoints.py`` so
these tests don't require a real client IP context.
"""
from unittest.mock import patch
with patch(
"litellm.proxy._experimental.mcp_server.discoverable_endpoints.IPAddressUtils.get_mcp_client_ip",
return_value=None,
):
yield
@pytest.fixture
def callback_test_client(monkeypatch):
"""FastAPI TestClient mounted with the MCP discoverable router.
Sets a deterministic ``LITELLM_SALT_KEY`` so encoded states minted
in-test can be decrypted by the handler.
"""
from fastapi import FastAPI
from fastapi.testclient import TestClient
monkeypatch.setenv("LITELLM_SALT_KEY", "sk-test-salt-for-LIT-2750")
from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
router,
)
app = FastAPI()
app.include_router(router)
return TestClient(app)
class TestCallbackOAuthErrorResponses:
"""LIT-2750: IdP error responses to ``/callback`` must not 422."""
def test_idp_error_with_no_state_returns_400_html(self, callback_test_client):
"""Pre-fix: 422 Pydantic. Post-fix: 400 HTML with the IdP's error."""
resp = callback_test_client.get(
"/callback",
params={
"error": "access_denied",
"error_description": "User declined access",
},
follow_redirects=False,
)
assert resp.status_code == 400
assert "text/html" in resp.headers["content-type"]
body = resp.text
assert "access_denied" in body
assert "User declined access" in body
# Sanity: must not leak the Pydantic validation error.
assert "Field required" not in body
def test_idp_error_html_escapes_user_controlled_fields(
self, callback_test_client
):
"""A malicious IdP must not be able to inject HTML/JS via error params."""
resp = callback_test_client.get(
"/callback",
params={
"error": "<script>alert(1)</script>",
"error_description": "<img src=x onerror=alert(2)>",
},
follow_redirects=False,
)
assert resp.status_code == 400
body = resp.text
# Raw tags must be escaped, not present verbatim.
assert "<script>alert(1)</script>" not in body
assert "<img src=x onerror=alert(2)>" not in body
assert "&lt;script&gt;alert(1)&lt;/script&gt;" in body
def test_idp_error_with_trusted_state_propagates_to_client_redirect_uri(
self, callback_test_client
):
"""When state decodes to a trusted (loopback) redirect_uri, propagate
the error back so the MCP client's OAuth library can surface it
instead of timing out waiting on the loopback."""
from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
encode_state_with_base_url,
)
state = encode_state_with_base_url(
base_url="http://localhost:3000/",
original_state="client-original-state-xyz",
client_redirect_uri="http://127.0.0.1:60108/callback",
)
resp = callback_test_client.get(
"/callback",
params={
"error": "access_denied",
"error_description": "User declined access",
"state": state,
},
follow_redirects=False,
)
assert resp.status_code == 302
location = resp.headers["location"]
assert location.startswith("http://127.0.0.1:60108/callback?")
assert "error=access_denied" in location
# Original client state must be round-tripped, not our wrapped state.
assert "state=client-original-state-xyz" in location
# error_description percent-encoded but present.
assert "error_description=User" in location
# Wrapped/encrypted state must NOT leak to the client.
assert state not in location
def test_idp_error_with_untrusted_redirect_uri_does_not_open_redirect(
self, callback_test_client
):
"""If the state minted earlier carries a redirect_uri that the proxy
no longer trusts, we must surface the error inline rather than
302-ing to an attacker-controlled URL (open-redirect)."""
from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
encode_state_with_base_url,
)
state = encode_state_with_base_url(
base_url="http://localhost:3000/",
original_state="x",
client_redirect_uri="https://attacker.example.com/steal",
)
resp = callback_test_client.get(
"/callback",
params={"error": "access_denied", "state": state},
follow_redirects=False,
)
# Must not 3xx — open redirect would defeat the redirect_uri allowlist.
assert resp.status_code == 400
assert "attacker.example.com" not in resp.headers.get("location", "")
assert "access_denied" in resp.text
def test_idp_error_with_undecryptable_state_falls_back_to_html(
self, callback_test_client
):
resp = callback_test_client.get(
"/callback",
params={
"error": "server_error",
"error_description": "boom",
"state": "not-a-valid-encrypted-state",
},
follow_redirects=False,
)
assert resp.status_code == 400
assert "server_error" in resp.text
assert "boom" in resp.text
def test_bare_callback_with_no_params_returns_400_not_422(
self, callback_test_client
):
"""An SSO redirect chain that drops the original /authorize query
params should land on a human-readable 400, not a Pydantic 422."""
resp = callback_test_client.get("/callback", follow_redirects=False)
assert resp.status_code == 400
assert "invalid_request" in resp.text
assert "Field required" not in resp.text
def test_success_path_still_redirects_with_code_and_state(
self, callback_test_client
):
"""Regression: the successful (``code``+``state``) flow must still
redirect back to the trusted client redirect_uri with the original
state preserved."""
from litellm.proxy._experimental.mcp_server.discoverable_endpoints import (
encode_state_with_base_url,
)
state = encode_state_with_base_url(
base_url="http://localhost:3000/",
original_state="orig-state-success",
client_redirect_uri="http://127.0.0.1:60108/callback",
)
resp = callback_test_client.get(
"/callback",
params={"code": "auth-code-abc", "state": state},
follow_redirects=False,
)
assert resp.status_code == 302
location = resp.headers["location"]
assert location.startswith("http://127.0.0.1:60108/callback?")
assert "code=auth-code-abc" in location
assert "state=orig-state-success" in location

View File

@ -1644,6 +1644,7 @@ class TestObservabilityCallbackBans:
"braintrust_api_key",
"braintrust_project",
"phoenix_project_name",
"phoenix_project_name_override",
"wandb_api_key",
"weave_project_id",
"gcs_bucket_name",
@ -1675,6 +1676,7 @@ class TestObservabilityCallbackBans:
"posthog_api_url",
"braintrust_project",
"phoenix_project_name",
"phoenix_project_name_override",
],
)
def test_observability_field_in_metadata_dict_is_rejected(

View File

@ -0,0 +1,74 @@
"""
Unit tests for is_thinking_enabled method in BaseConfig.
Tests the fix for issue #28576: handle None thinking param without crashing.
"""
import pytest
from litellm.llms.base_llm.chat.transformation import BaseConfig
class TestIsThinkingEnabled:
"""Test is_thinking_enabled handles various thinking parameter values."""
@pytest.fixture
def transformer(self):
"""Create a BaseConfig instance for testing."""
# BaseConfig is abstract, so we create a minimal concrete subclass
class ConcreteConfig(BaseConfig):
def __init__(self):
pass
def get_complete_url(self, *args, **kwargs):
return ""
def validate_environment(self, *args, **kwargs):
return {}
def transform_request(self, *args, **kwargs):
return {}, {}
def transform_response(self, *args, **kwargs):
return None
def get_supported_openai_params(self, model: str):
return []
def map_openai_params(self, *args, **kwargs):
return {}
def get_error_class(self, *args, **kwargs):
from litellm.llms.base_llm.chat.transformation import BaseLLMException
return BaseLLMException(500, "test error")
return ConcreteConfig()
@pytest.mark.parametrize(
"non_default_params,expected",
[
# thinking=None should not crash, returns False
({"thinking": None}, False),
# thinking={'type': 'enabled'} returns True
({"thinking": {"type": "enabled"}}, True),
# thinking key missing returns False
({}, False),
# thinking={} returns False
({"thinking": {}}, False),
# thinking with different type returns False
({"thinking": {"type": "disabled"}}, False),
# reasoning_effort present returns True
({"reasoning_effort": "medium"}, True),
# both thinking enabled and reasoning_effort returns True
({"thinking": {"type": "enabled"}, "reasoning_effort": "high"}, True),
# falsy thinking values should not crash
({"thinking": False}, False),
({"thinking": 0}, False),
({"thinking": ""}, False),
],
)
def test_is_thinking_enabled(self, transformer, non_default_params, expected):
"""Test is_thinking_enabled with various parameter combinations."""
result = transformer.is_thinking_enabled(non_default_params)
assert result == expected, (
f"Expected {expected} for params {non_default_params}, got {result}"
)