[Infra] Promote internal staging to main (#27245)

* default requested_model to empty string on litellm-side rejects * Update litellm/router.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: scope key access_group_ids override by team's assigned groups A team member could set any access_group_ids on their key (e.g. a group assigned only to a different team) and override the team's model restriction. Intersect the key's access_group_ids with team_object.access_group_ids in _key_access_group_grants_model so foreign groups are dropped before model expansion. Adds a regression test that asserts expansion is never called for foreign groups. * [Fix] Proxy: Skip Personal Budget Hook When Reservation Covers Counter The reservation path (PR #26845) atomically pre-fills `spend:user:{user_id}` and admits at the strict-`<` boundary. The legacy `_PROXY_MaxBudgetLimiter` pre-call hook re-reads the same counter with `>=`, so a reservation that fills the counter to exactly `max_budget` (e.g. a request without a `max_tokens` cap that falls back to reserving the smallest remaining headroom) is rejected by the hook even though the reservation already admitted it. Skip the hook when the request's active `budget_reservation` covers `spend:user:{user_id}`. The reservation is the source of truth for that counter cross-pod; the legacy `>=` path remains in place for requests without a reservation (e.g. paths that bypass the reservation entirely). Reproduces as `tests/otel_tests/test_prometheus.py::test_user_budget_metrics` on a fresh user with `max_budget=10` calling `fake-openai-endpoint` without `max_tokens`. Adds focused unit coverage in `tests/test_litellm/proxy/hooks/test_max_budget_limiter.py`. * harden bedrock file bucket validation * Fix syntax errors from botched merge in router.py * Fix Vertex batch output edge cases * [Fix] RBAC: Drop management_routes Write Fallback for Admin Viewer Greptile P1: the unsafe-method branch of `_check_proxy_admin_viewer_access` ended with a blanket `if route in management_routes: return`. That set is a mix of reads (info/list — handled via the safe-method GET branch above) and writes. The fallback let Admin Viewer POST to write endpoints not enumerated in `_ADMIN_VIEWER_BLOCKED_WRITE_ROUTES`, including: - /team/block, /team/unblock, /team/permissions_update - /jwt/key/mapping/{new,update,delete} - /key/bulk_update - /key/{key_id}/reset_spend Remove the fallback. The two remaining allow sets (admin_viewer_routes and global_spend_tracking_routes) are both read-only, so removal does not affect the legitimate POST-as-read cases (e.g. /spend/calculate, which is in spend_tracking_routes ⊂ admin_viewer_routes). Tests: - 8 new parametrized cases pinning each previously-leaking management write endpoint to 403 on POST for PROXY_ADMIN_VIEW_ONLY. * fix(tests): anchor VCR redis cassette key to repo root `os.path.relpath` with no `start` arg uses the current working directory, so running pytest from a subdirectory produced a different Redis key than running from the repo root. CI-recorded cassettes and locally-replayed runs would silently miss each other's cache. Anchor the path to the repo root (derived from `__file__`) so the key is stable regardless of CWD. https://claude.ai/code/session_018uCx7pcrkdUJZrCVMaTdPx * fix: gate key access_group override on group's own assignment Replaces the previous intersect-with-team.access_group_ids check, which made the override unreachable in practice (the team-gate fallback already covered every case the intersection allowed). The override now resolves each of the key's access_group_ids via get_access_object and accepts the group only if its assigned_team_ids includes the key's team_id, or its assigned_key_ids includes the key's token. This fulfills the original ask (a key can extend a team's allow-list via a group the admin granted to that team or that specific key) while still rejecting foreign groups referenced by team members of other teams. * [Fix] Proxy/Key Management: Honor team_member_permissions /key/list In /key/list Endpoint When a team grants /key/list via team_member_permissions, non-admin members should see all keys for that team — same as a team admin. Previously the classification in list_keys() only checked admin status, so permitted members fell into the service-account-only path and could not see other members' personal keys. Routes those members into the full-visibility set. * Fix access-group bypass via litellm-model fallback path When _get_all_deployments returns 0 candidates and the litellm-model fallback branch (_get_deployment_by_litellm_model) finds deployments that the access-group filter then empties, _access_group_filter_emptied_candidates remained False (it was captured before that branch ran). The router would then proceed to default fallbacks; the fallback model could have no access_groups and short-circuit the filter, silently serving a caller blocked by access-group restrictions. Update the flag inside the litellm-model branch when filtering empties a non-empty candidate set so the default-fallback guard still triggers. * fix(proxy): redact MCP server URL and headers for non-admin viewers (VERIA-8) Many MCP integrations (Zapier, etc.) embed an upstream API key directly in the server URL, e.g. ``https://actions.zapier.com/mcp/<api-key>/sse``. The list and single-server endpoints were returning the full URL to any authenticated user — `_redact_mcp_credentials` only stripped the explicit ``credentials`` field, and `_sanitize_mcp_server_for_virtual_key` only ran for restricted virtual keys. Non-admin internal users could read the dashboard, click the unmask toggle, and exfiltrate the raw token. Add `_sanitize_mcp_server_for_non_admin` that runs on top of the existing credential redaction and clears the credential-bearing fields: - ``url`` (the primary leak vector) - ``spec_path`` (OpenAPI spec URLs that may carry tokens) - ``static_headers`` / ``extra_headers`` (Authorization) - ``env`` (arbitrary secrets) - ``authorization_url`` / ``token_url`` / ``registration_url`` Identity fields (``server_id``, ``alias``, ``mcp_info``, etc.) are preserved so the UI can still list servers a non-admin's team has access to. Apply the new sanitizer in `fetch_all_mcp_servers` and the per-server fetch path right after the existing virtual-key branch. Update the existing `test_list_mcp_servers_non_admin_user_filtered` assertions that previously checked URL visibility. Frontend defense-in-depth: hide the URL unmask toggle on `mcp_server_view.tsx` unless the viewer is a proxy admin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix runtime policy attachment initialization Mark runtime-created policies and attachments initialized so global policy attachments created from the policy builder apply immediately without requiring a restart. Co-authored-by: Cursor <cursoragent@cursor.com> * test(router): cover _try_early_resolve_deployments_for_model_not_in_names The router_code_coverage CI check requires every function in router.py to be referenced by at least one test under tests/{local_testing, router_unit_tests,test_litellm} in a file with "router" in its name. The recently-extracted helper had no direct test, so the check failed with "0.45% of functions in router.py are not tested". Add a focused test that exercises the four return paths: model already in self.model_names, no fallback applies, pattern-router match, and default_deployment substitution (also asserting the stored default isn't mutated). https://claude.ai/code/session_019AVp1XL7RT9RxRe4qRLkay * Fix policy registry teardown in tests Reset the policy ID index during policy engine test cleanup so stale policy versions cannot leak between tests. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(batches): count non-chat tokens, validate batch-file model access (VERIA-39) (#27015) * fix(batches): count non-chat tokens and validate every model in batch file Two security control bypasses on POST /v1/batches: 1. `_get_batch_job_input_file_usage` only summed tokens for `body.messages` (chat completions). Embedding (`input`) and text completion (`prompt`) batches reported zero, letting massive non-chat workloads slip past TPM rate limits. Extend the counter to handle string and list shapes for both fields. 2. The batch input file was forwarded to the upstream provider without inspecting the models named inside the JSONL — only the outer `model` query parameter was checked against the caller's allowlist. A caller restricted to gpt-3.5 could submit a batch targeting gpt-4o and the upstream would execute it under the proxy's shared API key. Add `_get_models_from_batch_input_file_content` (returns the distinct `body.model` values) and call it from `_enforce_batch_file_model_access` in the pre-call hook, which runs each model through `can_key_call_model` so the same allowlist semantics (wildcards, access groups, all-proxy-models, team aliases) the proxy enforces on `/chat/completions` apply here too. Any unauthorized model raises a 403 before the file is forwarded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(batches): count pre-tokenized prompt/input shapes, classify 403 logs Two follow-ups from the Greptile review on the batch validation PR: 1. P1 TPM bypass via integer token arrays. The OpenAI batch schema accepts ``prompt`` and ``input`` as ``list[int]`` (a single pre-tokenized prompt) or ``list[list[int]]`` (multiple) in addition to the string and ``list[str]`` shapes. Pre-fix only the string shapes were counted, so a caller could submit a batch with hundreds of millions of pre-tokenized tokens and the rate limiter would record zero. Extract the per-field logic into ``_count_prompt_or_input_tokens`` and count each int as one token. 2. P2 access-denial logs were indistinguishable from I/O failures. ``count_input_file_usage`` caught every exception under a generic "Error counting input file usage" message, so an intentional 403 from ``_enforce_batch_file_model_access`` looked the same in the logs as a missing file or a Prisma timeout. Catch ``HTTPException`` separately and log 403s at WARNING level with a security-relevant message before re-raising. Tests cover the new shapes: single ``list[int]``, ``list[list[int]]`` (the worst-case bypass vector), and embeddings ``input`` with pre-tokenized arrays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): re-validate user_id after /user/info re-parses query (#27009) * fix(proxy): re-validate user_id ownership after /user/info re-parses query The route-level access check in `RouteChecks.non_proxy_admin_allowed_routes_check` reads `request.query_params.get("user_id")`, which decodes literal `+` to spaces. The endpoint then re-parses the raw query string with `urllib.unquote` in `get_user_id_from_request` to preserve `+` characters (so plus-addressed emails work as user_ids). Those two paths produce different ids: a caller who registered a user_id containing a literal space could pass the route check and then read another user's row by sending the encoded `+` form. Add `_enforce_user_info_access` and call it after `_normalize_user_info_user_id` returns the final id. Proxy admin / view-only admin still bypass; everyone else must match the resolved user_id (or have no user_id, which falls back to the caller's own id later in the handler). Tests cover the admin bypass, owner-match path, and the cross-user lookup that this change blocks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): apply user_info ownership check to PROXY_ADMIN_VIEW_ONLY `_enforce_user_info_access` was bypassing both PROXY_ADMIN and PROXY_ADMIN_VIEW_ONLY, but the upstream route check in `RouteChecks.non_proxy_admin_allowed_routes_check` only treats PROXY_ADMIN as a true admin for the `/user/info` route — view-only admins go through the `user_id == valid_token.user_id` enforcement along with regular users. Mirroring that asymmetry left the same encoded-`+` bypass open for view-only admins whose user_id contains a literal space. Drop the PROXY_ADMIN_VIEW_ONLY exemption so the post-decode re-check matches the upstream rule. Update tests: a view-only admin must now be blocked from cross-user lookups but still allowed to read their own row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(spend-logs): opt-in suppression of stack traces in spend-tracking error logs Adds LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS env var. When set to true and the proxy log level is INFO or above, spend-tracking error paths emit a single ERROR line without the full traceback. Stack traces are preserved at DEBUG and the Sentry / proxy_logging_obj.failure_handler path is unchanged. The new spend_log_error helper is wired through the spend write hot path: - DBSpendUpdateWriter (update_database, _update_*_db, batch upsert, redis-commit fallbacks) - _ProxyDBLogger._PROXY_track_cost_callback - get_logging_payload exception path - update_spend / update_daily_tag_spend / spend logs queue monitor Resolves LIT-2704. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(spend-logs): preserve no-traceback behavior for update_daily_tag_spend This call site previously logged a single-line error via verbose_proxy_logger.error() with no traceback. Switching it to spend_log_error(..., exc=e) caused a full stack trace to render by default (when LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS is unset), which contradicts the PR goal of leaving default behavior unchanged. Revert this specific site to the original error log call. * fix(spend-logs): preserve no-traceback behavior for update_daily_tag_spend Bugbot caught a regression: the previous error log here was a single-line verbose_proxy_logger.error(...) with no traceback. spend_log_error attaches the active exception's traceback by default (when the suppression env var is unset), so swapping it in changed default behavior. Revert this one site to its original .error() call to keep the PR strictly opt-in. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * feat(spend-logs): suppress traceback in SpendLogs error_information row Extend LITELLM_SUPPRESS_SPEND_LOG_TRACEBACKS to the failure callback so the per-row Metadata pane in the UI no longer shows the stack trace when the opt-in env var is set, matching the existing console-side suppression. https://claude.ai/code/session_014dztoRbRnRvq54HL9EyHx6 * [Fix] Proxy: Repair Merge Fallout In Router-Override Fallback Auth Conflict resolution for #26968 dropped the `Iterator` typing import (NameError at module load), left a dead `fallback_models = cast(...)` block, and the new tests called `_enforce_key_and_fallback_model_access` without the now-required `request` kwarg. * isolate dual OTEL handlers * harden cloud file compatibility path * harden cloud file compatibility path * [Fix] Proxy/Key Management: Align Key-Org Membership Checks On Generate And Regenerate Mirrors the membership rule on /key/update so that /key/generate and /key/{key}/regenerate apply the same `_validate_caller_can_assign_key_org` gate when the caller specifies an `organization_id`. Proxy admins bypass. The check no-ops when `organization_id` is not being set. * thread trusted params through vertex file content * trust only server legacy file flag * chore(proxy): keep public AI hub unauthenticated * fix(proxy): preserve low-detail readiness status * [Test] Anthropic: Replace Legacy Claude-4-Sonnet Alias With Haiku 4.5 Three live-API tests pinned to claude-4-sonnet-20250514, which is a non-canonical alias of claude-sonnet-4-20250514. Anthropic's main API no longer resolves the legacy form under freshly issued keys, so the tests fail with not_found_error. The token counter test pinned to claude-sonnet-4-20250514 itself (deprecation_date 2026-05-14, two weeks out) was on borrowed time too. Bump all four to claude-haiku-4-5-20251001 — capability superset for what these tests exercise (streaming, parallel tool calling, extended thinking, token counting), no upcoming deprecation, cheaper per-token. * chore(proxy): move URL-valued model/file_id guard from SDK to proxy The previous per-provider guards in HuggingFace, Oobabooga, and Gemini files lived in the SDK layer, breaking SDK callers who legitimately pass URL-valued model identifiers. Move the check to the proxy boundary in add_litellm_data_to_request so SDK users keep working while proxy users default-deny URL-valued model and file_id, with admin opt-in via litellm.provider_url_destination_allowed_hosts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [Chore] Proxy/UI: Drop stray _experimental/out/chat/index.html This file is a regenerable UI build artifact that should not be tracked in source. Removing so the merge into litellm_internal_staging stays clean. * [Test] Anthropic Passthrough: Bump Streaming Cost-Injection Test To Haiku 4.5 test_anthropic_messages_streaming_cost_injection hits the proxy's /v1/messages route, which routes via the anthropic/* wildcard to api.anthropic.com. The 404 surfaced in the test was Anthropic's own not_found_error propagated back through the proxy (visible from the x-litellm-model-id hash on the response — the proxy did route). Same root cause as the prior commit: the legacy claude-4-sonnet-20250514 alias is no longer recognized by Anthropic's main API under the new key. Swap to claude-haiku-4-5-20251001 — same routing path, canonical model. * fix(proxy): handle ownership-recording failures after upstream create If record_container_owner raises after the upstream container is created, the user previously got a 500 with no usable container — they were billed for an unreachable resource. Move ownership recording into the create path's exception handling and split the two failure modes: - HTTPException from the recorder (auth conflicts) propagates verbatim so the client sees the real status code, not a generic LLM error. - Unexpected exceptions are logged and swallowed; the response is returned to the caller so they aren't billed for a container they can't address. The DB row stays untracked until an operator reconciles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(guardrails): close post-call coverage gaps * fix(types): add /team/permissions_bulk_update to management_routes The blocklist check in _check_proxy_admin_viewer_access only fires for routes that match LiteLLMRoutes.management_routes — the bulk-update endpoint was missing from that list, so the test for view-only admins on /team/permissions_bulk_update fell through to "allow." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [Test] Anthropic Passthrough: Bump Thinking Tests Off Legacy Sonnet 4 Alias base_anthropic_messages_test.test_anthropic_messages_with_thinking and test_anthropic_streaming_with_thinking still pinned to claude-4-sonnet-20250514 — the same legacy alias Anthropic no longer recognizes under freshly issued keys. The other four tests in this base class already use claude-sonnet-4-5-20250929; these two were missed. Bump to claude-haiku-4-5-20251001 (supports_reasoning=true, no upcoming deprecation). Subclasses including TestAnthropicPassthroughBasic inherit these methods. * fix(guardrails): cover multi-choice output variants * fix(proxy): preserve public ai hub ui setting * fix(scim): cascade FK cleanup on user delete and surface block status in UI SCIM DELETE /Users/{id} previously called litellm_usertable.delete without clearing rows that FK back to the user, so Postgres rejected the delete with LiteLLM_InvitationLink_user_id_fkey and the SCIM caller saw a 500. Add a helper to drop invitation_link, organization_membership, and team_membership rows before the user delete (mirrors /user/delete in internal_user_endpoints). Also add a Status column to the Virtual Keys and Internal Users tables so admins can see at a glance which keys are blocked and which users SCIM has deactivated. SCIM-blocked keys carry a tooltip explaining the origin. Pin the dashboard's Node version to 20 via .nvmrc to match CI. * chore: update Next.js build artifacts (2026-05-02 03:21 UTC, node v20.20.2) * perf(proxy): cache container/skill ownership reads on the hot path Container ownership and skill rows are looked up on every retrieve / delete / list / file-content / chat-completion-with-skill call. The new stores wrapped raw Prisma queries with no cache, putting one DB round-trip on each request. Add an in-process TTL'd cache mirroring the _byok_cred_cache pattern in mcp_server/server.py: per-key (value, monotonic_timestamp), 60s TTL, 10000-entry cap with full-clear on overflow, invalidated by every write. Negative results (`None`) are cached too so untracked-resource checks also skip the DB. Tests cover: cache-after-first-hit, negative caching, write invalidation, no-caching-on-DB-error, TTL expiry, capacity eviction. 56 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: update Next.js build artifacts (2026-05-02 03:39 UTC, node v20.20.2) * fix: remove traceback key instead of it being "" * fix: linting error * fix(scim): preserve scim_active on PUT when client omits the field A SCIM PUT may legally omit `active` (full-replace with the field absent). Pydantic fills the SCIMUser.active default of True, so the PUT handler was overwriting metadata.scim_active with True even when the client never sent it — silently reactivating a previously SCIM-blocked user and unblocking their keys. Use model_fields_set to detect whether the client actually sent `active`. If omitted, preserve the prior scim_active value and skip the cascade to virtual keys. Also drop comments added in this PR that just narrate what the code does; keep only the docstrings and the SQL-NULL pitfall note that explain non-obvious behaviour. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(proxy): use set lookup for permitted agent filters * fix(mcp): redact command fields for non-admin server views * fix(proxy): forward decoded container ids after ownership checks * fix(caching): handle stale isolated Redis semantic index * fix(cloudflare): support response_text in streaming chunk parser Newer Cloudflare Workers AI models (e.g. Nemotron) emit 'response_text' instead of 'response' on streamed chunks. The non-streaming path was already updated to fall back to 'response_text' (#26385), but the streaming chunk parser still only read 'response', which caused streaming requests against those models to silently produce empty content. Mirror the non-streaming fallback in CloudflareChatResponseIterator.chunk_parser and add a streaming test for the response_text shape. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * Fix code qa * Address bugbot: drop dead encode/decode helpers; preserve empty custom_id - Remove unused _encode_gcp_label_value / _decode_gcp_label_value singular helpers; only the _chunks variants are actually called. - Use 'is not None' check for custom_id so empty-string custom_ids are still labeled and round-trip through batch outputs. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * Forward Vertex file content logging context * test vertex file content logging forwarding Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * Fix Vertex batch output logging mutation * fix: don't mutate caller's logging_obj in _try_transform_vertex_batch_output_to_openai The method was overwriting logging_obj.optional_params, logging_obj.model, and logging_obj.start_time on the caller's Logging instance. When invoked from llm_http_handler.py's generic framework path, the framework's own logging_obj (which already went through pre_call) had its properties clobbered, causing model and start_time to reflect the last batch line's values rather than the original call context. Fix: create a fresh local Logging instance for the per-line transformation instead of mutating the incoming logging_obj. The caller's object is now left entirely untouched regardless of whether a logging_obj was passed in or not. Regression tests added to verify model, start_time, and optional_params are not mutated on the caller's logging_obj. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * feat: add opt-out flag for Vertex batch output transformation Adds litellm.disable_vertex_batch_output_transformation (default False). When True, afile_content returns raw Vertex predictions.jsonl untouched so users that parse candidates/modelVersion directly are not broken. * fix(anthropic,bedrock): omit thinking/output_config when reasoning_effort="none" Setting reasoning_effort="none" on Anthropic chat models (direct, Bedrock Invoke, Bedrock Converse, Vertex AI Anthropic, Azure AI Anthropic) crashed LiteLLM with: litellm.APIConnectionError: 'NoneType' object has no attribute 'get' Both the Anthropic chat transformation and Bedrock Converse called ``AnthropicConfig._map_reasoning_effort`` and assigned the ``None`` it returns for ``"none"`` directly to ``optional_params["thinking"]``. Downstream ``is_thinking_enabled`` then did ``optional_params["thinking"].get("type")`` and crashed. Pop ``thinking`` (and on Claude 4.6/4.7, ``output_config``) instead of assigning ``None``, restoring the documented contract that ``reasoning_effort="none"`` means "do not enable thinking". This also prevents downstream Anthropic 400s ("thinking: Input should be an object", "output_config.effort: Input should be ...") if the bug were ever masked. Verified end-to-end against the live Anthropic API and Bedrock Converse on claude-opus-4-{5,6,7} and claude-sonnet-4-6, plus Bedrock Invoke for Claude 4.5/4.6. Vertex AI Anthropic and Azure AI Anthropic inherit the fixed ``map_openai_params`` from ``AnthropicConfig`` and need no further changes. * fix(vertex-ai): set response=null on batch error entries per OpenAI spec The Vertex batch output transformer was emitting both a populated 'response' and 'error' for failed batch entries. The OpenAI Batch output spec defines them as mutually exclusive: on error 'response' MUST be null. This broke any consumer using 'result["response"] is None' to detect failures. * test(vertex-ai): cover transformation_error path emits response=null * fix(security): sandbox jinja2 in gitlab/arize/bitbucket prompt managers DotpromptManager was hardened to render through ImmutableSandboxedEnvironment. The three sibling managers (gitlab, arize, bitbucket) were missed and still instantiate plain jinja2.Environment(), leaving the same attribute-traversal SSTI primitive open: a template fetched from a GitLab/BitBucket repo or Arize Phoenix workspace can reach __class__.__init__.__globals__ and execute arbitrary Python on the proxy host. Match the dotprompt pattern by switching all three to ImmutableSandboxedEnvironment. The sandbox blocks the dunder-traversal chain while leaving normal {{ var }} substitution intact, so the template surface is unchanged for legitimate use. Adds tests/test_litellm/integrations/test_prompt_manager_ssti.py (18 cases) verifying each manager's jinja_env is a sandbox, that classic SSTI payloads raise SecurityError, and that ordinary variable rendering still works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(proxy): drop client-supplied pricing fields from request bodies The proxy currently forwards request-body pricing parameters (the fields on `CustomPricingLiteLLMParams`, plus `metadata.model_info`) into the core call path. Those fields belong to deployment configuration, not to per-request input — sending them from a client mutates the request's recorded cost and, via `litellm.completion` → `register_model`, the process-wide `litellm.model_cost` map for every later caller in the worker. Strip them at the boundary. The strip set is built from `CustomPricingLiteLLMParams.model_fields` so pricing fields added later are covered automatically. Operators who do want clients to supply per-request pricing can opt back in per key or team via `metadata.allow_client_pricing_override = true`, mirroring the existing `allow_client_mock_response` and `allow_client_message_redaction_opt_out` flags. Tests cover the strip set's coverage, root and metadata strips, the opt-in skip on both key and team metadata, and a regression check that the global `litellm.model_cost` map is unmutated after a stripped request. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(proxy): log stripped pricing fields at debug for operator visibility Operators upgrading would otherwise see client-supplied pricing overrides silently stop applying with no diagnostic. Emit a debug-level line listing the dropped fields and pointing at the opt-in flag when any are stripped; stay silent on the no-op path so the log isn't filled with noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): move pricing strip below the litellm_metadata JSON-string parse The strip ran before the proxy parses ``litellm_metadata`` from a JSON string into a dict (a path used by multipart/form-data and ``extra_body`` callers), so ``isinstance(metadata, dict)`` was False and ``model_info`` survived the strip. Move the call to the same post-parse position the ``user_api_key_*`` strip already uses for the same reason. Adds a regression test exercising the JSON-string ``litellm_metadata`` path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(responses): replace legacy claude-4-sonnet alias in multiturn tool-call test Anthropic's main API no longer resolves the non-canonical 'claude-4-sonnet-20250514' alias for freshly issued keys, returning 404 not_found_error. PR #27031 already swept three other live tests pinned to this alias to claude-haiku-4-5-20251001 but missed test_multiturn_tool_calls in the responses API suite, which is now failing reliably on PR CI runs (e.g. PR #27074, job 1603363). Bump the two model references in test_multiturn_tool_calls to the same claude-haiku-4-5-20251001 snapshot used by PR #27031 -- it covers everything this test exercises (tool calling, multi-turn) and isn't on a deprecation schedule. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * chore(proxy): close callback-config and observability-credential side channels Two related gaps in the proxy's request bouncer: 1. ``is_request_body_safe`` (auth_utils.py) walked the request-body root and the ``litellm_embedding_config`` nested dict, but not ``metadata`` or ``litellm_metadata``. The same fields it bans at root — Langfuse / Langsmith / Arize / PostHog / Braintrust / Phoenix / W&B Weave / GCS / Humanloop / Lunary credentials and routing — were silently accepted when the caller put them inside metadata, retargeting observability callbacks to a caller-controlled host with caller-supplied creds. Walk both metadata containers (and parse the JSON-string form sent via multipart / ``extra_body``) through the same banned-params helper, so the existing ``allow_client_side_credentials`` opt-in covers both paths consistently. 2. The banned-params list was hand-maintained and lagged the canonical ``_supported_callback_params`` allow-list in ``initialize_dynamic_callback_params``. Derive the observability bans from that allow-list (minus a small ``_SAFE_CLIENT_CALLBACK_PARAMS`` set for informational fields like ``langfuse_prompt_version`` and ``langsmith_sampling_rate``) so future integrations are covered automatically; ``_EXTRA_BANNED_OBSERVABILITY_PARAMS`` carries the handful of fields integrations read but the allow-list hasn't caught up to. A guard test fails CI if a new entry is added to ``_supported_callback_params`` without an explicit safe-list decision. Separately in ``litellm_pre_call_utils.py``: add ``callbacks``, ``service_callback``, ``logger_fn``, and ``litellm_disabled_callbacks`` to ``_UNTRUSTED_ROOT_CONTROL_FIELDS``. The first three are appended to worker-wide ``litellm.{input,success,failure,_async_*,service}_callback`` lists / ``litellm.user_logger_fn`` from inside ``function_setup`` — one request poisons every subsequent caller in that worker. The last is the inverse primitive: the legitimate path reads it from key/team metadata, the request-body version silently disables admin-configured audit / observability for the call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(auth): per-param allow must continue, not return early A pre-existing logic bug in ``_check_banned_params``: when the deployment-level ``configurable_clientside_auth_params`` permitted one banned field, the loop ``return``-ed on the first match instead of ``continue``-ing, so any other banned param later in the same body or metadata dict was never checked. This PR's metadata walk multiplies the surface where that bypass matters — a body pairing an allowed ``api_base`` with an observability credential like ``langfuse_host`` would silently pass. Proxy-wide ``allow_client_side_credentials`` keeps ``return`` (it's a global opt-in for every banned param). The per-param branch becomes ``continue`` so only the one explicitly-permitted field is skipped. Adds a regression test that exercises the api_base + langfuse_host pair. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(vector_store): resolve embedding config at request time, never persist creds The vector store create/update path previously called ``_resolve_embedding_config`` against the admin-configured router/DB model and persisted the resolved ``litellm_embedding_config`` dict (``api_key`` / ``api_base`` / ``api_version``) into the ``litellm_managedvectorstorestable.litellm_params`` column. Because the resolver expanded ``os.environ/...`` references via ``get_secret``, the DB row carried cleartext provider credentials, and the ``/vector_store/{new,info,update,list}`` responses returned them to any authenticated caller who could supply a known admin model name. Move the auto-resolve out of ``create_vector_store_in_db`` and out of the update path. Persist only the user-supplied ``litellm_embedding_model`` reference. Resolve at request-handling time inside ``_update_request_data_with_litellm_managed_vector_store_registry`` so the resolved config lives in the per-request ``data`` dict and is garbage-collected after the response. Legacy rows that were created by an earlier proxy version and already carry a resolved ``litellm_embedding_config`` skip the re-resolution and pass through unchanged so embedding calls keep working. The ``new_vector_store`` response now also runs the existing ``_redact_sensitive_litellm_params`` masker (already used by ``info``, ``update``, and ``list``), defending against caller-supplied cleartext on the create path and against legacy rows whose persisted credentials are still in the database. Existing tests that asserted the old write-time-resolve behaviour are updated to assert the new persistence shape (no embedding config stored, just the model reference). Two new tests cover the use-time path: one asserting fresh resolution happens when a row carries only the model reference, the other asserting legacy rows with persisted config skip re-resolution and continue to work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(vector_store): tighten registry-mutation comment and dedupe test helpers * fix(vector_store): cache use-time embedding-config resolution Hold the resolved config in a process-memory TTL cache so the request-handling path doesn't run litellm_proxymodeltable.find_first on every vector-store call. * fix(anthropic,bedrock,vertex): forward output_config.effort + 400 on garbage reasoning_effort Follow-up bugs surfaced by the QA sweep on PR #27039 (https://github.com/BerriAI/litellm/pull/27039#issuecomment-4363363610). 1. Stop stripping output_config.effort on Bedrock + Vertex adaptive routes. - Vertex AI Claude 4.6/4.7 accepts output_config.effort on rawPredict (verified end-to-end against us-east5 / global). The strip helper now no-ops for effort. - Bedrock Converse routes output_config into additionalModelRequestFields for anthropic base models so the requested adaptive tier (low/medium/ high/xhigh/max) actually reaches the wire instead of all collapsing to identical thinking. - Bedrock Invoke chat transformation (AmazonAnthropicClaudeConfig) stops popping output_config from the post-AnthropicConfig request body. - Bedrock Invoke /v1/messages allowlist (BedrockInvokeAnthropicMessagesRequest) now lists output_config so the runtime allowlist filter forwards it. 2. Validate effort across Bedrock Converse so 'disabled' / 'invalid' / '' / unsupported tiers (xhigh/max on Sonnet 4.6 or budget-mode 4.5 models) surface as a clean 400 BadRequestError instead of 500. 3. ValueError -> BadRequestError throughout (AnthropicConfig.map_openai_params, _apply_output_config, AmazonConverseConfig._handle_reasoning_effort_parameter). Empty-string effort is now rejected (was silently passing the 'if effort and ...' short-circuit). 4. Floor reasoning_effort='minimal' at the Anthropic provider minimum (1024 budget_tokens) via new ANTHROPIC_MIN_THINKING_BUDGET_TOKENS so it's a usable tier on direct Anthropic / Azure AI Anthropic / Vertex AI Anthropic / Bedrock Invoke (all of which 400 below 1024). 5. model_prices: dedupe duplicate supports_max_reasoning_effort key on claude-opus-4-7 / claude-opus-4-7-20260416. Adds regression tests across all five affected paths; existing tests asserting the silent-strip behavior were updated to reflect the new pass-through and clean 400 surfaces. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(constants): make ANTHROPIC_MIN_THINKING_BUDGET_TOKENS a plain constant The documentation CI test (tests/documentation_tests/test_env_keys.py) asserts every os.getenv() key in the source has a matching entry in the litellm-docs config_settings.md table. ANTHROPIC_MIN_THINKING_BUDGET_TOKENS tracks Anthropic's published wire-protocol minimum (1024) — it's not a user-tunable, so making it env-overridable was wrong anyway. Drop the os.getenv() wrapper; the value is now a plain literal. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(anthropic,bedrock): correct effort error message and dedupe effort_map - Remove 'none' from the Bedrock _validate_anthropic_adaptive_effort error message; it was listed as a valid value but rejected by the membership check, leaving users in a feedback loop if they tried 'none'. - Hoist the duplicated reasoning_effort -> output_config.effort mapping out of AnthropicConfig.map_openai_params and AmazonConverseConfig._handle_reasoning_effort_parameter into a single AnthropicConfig.REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT class constant so the two routes cannot drift. * fix(anthropic): translate reasoning_effort on /v1/messages route Closes the remaining QA-sweep gap on PR #27074: Bedrock Invoke /v1/messages was silently ignoring ``reasoning_effort`` because the shared param filter only kept native Anthropic keys, so every effort tier collapsed to the same behavior on the wire (27/231 cells failing across opus-4-5 / opus-4-6 / sonnet-4-6). Map ``reasoning_effort`` to native Anthropic ``thinking`` / ``output_config.effort`` at the ``AnthropicMessagesConfig`` layer so all four /v1/messages routes (direct Anthropic, Azure AI, Vertex AI, Bedrock Invoke) inherit the same translation: - Add ``reasoning_effort`` to ``AnthropicMessagesRequestOptionalParams`` so the param filter in ``AnthropicMessagesRequestUtils.get_requested_anthropic_messages_optional_param`` no longer drops it before the transformation runs. - Add ``_translate_reasoning_effort_to_anthropic`` and call it from ``transform_anthropic_messages_request``. Mirrors ``AnthropicConfig.map_openai_params`` on the chat completion path (re-uses ``_map_reasoning_effort`` and ``REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT``) so the two routes cannot drift. Pops ``reasoning_effort`` so it never reaches the wire. - Caller-supplied native ``thinking`` / ``output_config.effort`` always win — same precedence as ``_translate_legacy_thinking_for_adaptive_model``. - Garbage values (``""``, ``"disabled"``, ``"invalid"``) raise ``AnthropicError(status_code=400)`` instead of falling through and surfacing as 500s from the provider. - ``"none"`` clears thinking + output_config so callers can opt out per request. Also restores the non-adaptive-model test coverage on Bedrock Invoke /v1/messages that the previous commit lost when ``test_bedrock_messages_strips_output_config`` was renamed to the ``forwards`` variant on Opus 4.7. Adds a new test file ``test_reasoning_effort_translation.py`` covering the translation at the shared config level (adaptive + non-adaptive models, none, garbage, caller precedence) so all four /v1/messages routes are exercised by a single suite. Adds parametrized + behavioral tests on the Bedrock Invoke /v1/messages suite covering: minimal/low/medium/high/xhigh/max mapping for adaptive models, thinking-budget mapping for non-adaptive Opus 4.5, ``none`` clears both, garbage raises 400, explicit ``output_config`` wins. Refs: https://github.com/BerriAI/litellm/pull/27074 * fix(anthropic,bedrock): reject unmapped reasoning_effort at mapping site Both the chat completion path (AnthropicConfig.map_openai_params) and the Bedrock Converse path (_handle_reasoning_effort_parameter) used REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get(value, value) which falls back to the raw input on unmapped keys. Combined with _map_reasoning_effort returning type='adaptive' for any string on Claude 4.6/4.7, garbage values (e.g. 'disabled') could leak into optional_params['output_config']['effort'] unvalidated if map_openai_params ran without the downstream transform_request or _validate_anthropic_adaptive_effort check. Mirror the /v1/messages pattern: use .get(value) (no fallback) and raise BadRequestError immediately when the value is unmapped, co-locating validation with the mapping for defense in depth. * style: black formatting Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(anthropic): stop class-attr leak; gate xhigh/max on every route The reasoning-effort mapping dict was a public class attribute on AnthropicConfig, so BaseConfig.get_config returned it as a request parameter and every Anthropic-backed call (Anthropic / Azure / Vertex / Bedrock Invoke) hit a 400 'REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT: Extra inputs are not permitted' from the provider. Move the mapping to a module-level constant. _supports_effort_level only looked the model up under custom_llm_provider='anthropic', so bedrock-prefixed model ids (e.g. bedrock/invoke/us.anthropic.claude-opus-4-7) returned False for both 'max' and 'xhigh' even when the underlying model entry has the flag set. Strip known provider prefixes and retry the lookup against litellm.model_cost directly so per-model gating works on every route. Mirror the per-model xhigh/max gate from AnthropicConfig._apply_output_config in AnthropicMessagesConfig._translate_reasoning_effort_to_anthropic so the /v1/messages route also raises a clean 400 instead of forwarding the unsupported tier. * feat(anthropic,bedrock): strip output_config under drop_params for non-effort models When a proxy fronts Claude Code (which always sends `output_config.effort`) at a pre-4.5 Anthropic model — haiku-3, sonnet-3.5, opus-3, sonnet-4 — the forwarded knob causes a forced 400 the client can't fix. Gating a strip behind the existing `drop_params` flag lets operators opt into silent fixup once and stop worrying about per-model param hygiene. Default (`drop_params=False`) still forwards and surfaces the provider's error, preserving the strict, debuggable contract from #27074. Per https://platform.claude.com/docs/en/build-with-claude/effort the supporting set is Opus 4.5+, Sonnet 4.6+, and Mythos Preview; everything else is dropped (with a verbose_logger warning so the strip is visible). Recognition uses model-name patterns plus a fallback to any `supports_*_reasoning_effort` flag in the model map for forward compatibility with new entries. https://claude.ai/code/session_01WjHq31rvXT6xYNdVmSJvRp (cherry picked from commit 1233943e7861ba8a9062f792310ebd401cb03db8) * fix(base_llm): filter all _-prefixed class attrs from get_config The drop_params strip work added `AnthropicConfig._EFFORT_SUPPORTING_MODEL_PATTERNS` as a private class-level lookup tuple. `BaseConfig.get_config()` only filtered the `__`-prefixed names plus `_abc` / `_is_base_class`, so `_EFFORT_SUPPORTING_MODEL_PATTERNS` would have leaked into the request body the same way `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT` did before the previous commit. Generalize the existing `_abc` / `_is_base_class` carve-outs to skip every `_`-prefixed name. `AmazonConverseConfig.get_config()` overrides the base method, so apply the same change there. Also unblocks future internal helpers from accidentally serialising into the wire body. * fix(anthropic): drive output_config.effort support from model map flags Replace hardcoded _EFFORT_SUPPORTING_MODEL_PATTERNS with a JSON-backed check that uses supports_*_reasoning_effort flags from the model map. Add supports_minimal_reasoning_effort: true to opus-4-5 and mythos-preview entries (which previously only carried supports_reasoning) so the JSON remains the single source of truth for effort capability. * fix(anthropic,bedrock,databricks): four reasoning_effort follow-ups - claude-sonnet-4-6 + reasoning_effort=max no longer 400s. Renamed _is_opus_4_6_model to _is_claude_4_6_model at three sites and added supports_max_reasoning_effort: true to 12 model entries in the JSON cost map (10 sonnet 4.6 ids + OpenRouter opus 4.6/4.7). - _map_reasoning_effort now raises BadRequestError(400) directly with llm_provider, instead of letting Databricks (and similar callers) surface its raw ValueError as a 500. - output_config.effort on Opus 4.5 over Bedrock no longer 400s for missing effort-2025-11-24 beta. Flipped JSON to "effort-2025-11-24" for bedrock + bedrock_converse and added an auto-attach branch in _process_tools_and_beta for non-adaptive Anthropic + output_config on Converse. - reasoning_effort=xhigh / =max on legacy budget-mode models (Haiku 4.5, Sonnet 4.5, Opus 4.5) now map to thinking.budget_tokens 8192 / 16384 instead of returning 400. Added two constants in litellm/constants.py. Tests updated for all four flips. Validated end-to-end via 306-cell live proxy matrix (6 model families x 3 routes x 17 effort cases), all pass. * fix(databricks): validate reasoning_effort and set output_config on adaptive Claude The Databricks path called `AnthropicConfig._map_reasoning_effort` for Claude models but never validated the effort string nor set `output_config.effort` for adaptive models (Claude 4.6/4.7). Since `_map_reasoning_effort` returns `type=adaptive` for ANY non-None / non-"none" string on adaptive models (including "disabled", "invalid", ""), Databricks silently accepted garbage and emitted a request without an `output_config.effort`, collapsing every adaptive tier to identical behavior. Match the Anthropic native, Bedrock Converse, Bedrock Invoke, and /v1/messages paths: when the resolved `thinking` is non-None on a 4.6/4.7 model, look up the value in `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT` and either raise a clean `BadRequestError` or set `optional_params["output_config"]`. * fix(azure): omit model from image generation and image edit deployment requests Azure OpenAI routes image gen/edit by deployment in the URL; sending the deployment id in model breaks gpt-image-2 (invalid_value). Strip model from JSON for deployments/.../images/generations and from multipart data for .../images/edits. Non-deployment URLs (e.g. Azure AI FLUX) unchanged. Fixes #26316. Co-authored-by: Cursor <cursoragent@cursor.com> * test(azure): exercise image gen JSON filter via HTTP client; dedupe image edit URL - Image generation tests patch HTTPHandler.post / get_async_httpx_client so make_*_azure_httpx_request runs and wire json is asserted on call kwargs. - Azure image edit: strip model in finalize_image_edit_multipart_data using the same URL string the handler passes to POST (no second get_complete_url in transform). BaseImageEditConfig default finalize is a no-op. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(azure_ai/anthropic): promote output_config out of extra_body so validation runs `azure_ai` is registered in `litellm.openai_compatible_providers`, so `add_provider_specific_params_to_optional_params` (litellm/utils.py) auto-stuffs any non-OpenAI kwarg (e.g. `output_config={"effort": "..."}`) into `optional_params["extra_body"]`. `AzureAnthropicConfig.transform_request` then strips `extra_body` entirely on the way out, silently dropping the param — and `AnthropicConfig._apply_output_config` never sees it, so `effort="invalid"` / `effort="xhigh"` on a non-supporting model quietly reaches the model with default behavior instead of returning a clean 400 (as the native `anthropic` provider does). Promote the keys back to top-level `optional_params` (using `setdefault` so explicit top-level values win) before delegating to the parent `AnthropicConfig`. Apply in both `validate_environment` and `transform_request` so flag detection (`is_mcp_server_used`, etc.) and output-config validation both run. Surfaced by the QA matrix expansion on PR #27074: 20 cells where Azure returned 200 while `anthropic` returned 400 — all `output_config` mode across haiku_4_5, sonnet_4_5, opus_4_5, sonnet_4_6, opus_4_6, opus_4_7 families with `effort` in {invalid, xhigh, max, low, medium, high}. Tests: * `test_output_config_promoted_from_extra_body`: valid effort reaches data * `test_invalid_output_config_effort_raises_via_extra_body`: 400 on bad effort * `test_unsupported_effort_xhigh_raises_via_extra_body`: 400 on xhigh-on-Sonnet-4.6 * `test_extra_body_promotion_does_not_clobber_top_level`: setdefault semantics * test(image_gen): expect no model in Azure image edit multipart (#26316) Align test_azure_image_edit_litellm_sdk with deployment-scoped Azure edits. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(anthropic): extract _validate_effort_for_model to prevent drift The chat completion path (`_apply_output_config`) and the /v1/messages pass-through (`AnthropicMessagesConfig._translate_reasoning_effort_to_anthropic`) both gate `max` / `xhigh` per model. The two sites had diverged from near-identical copies into separately maintained blocks, creating a real drift risk when a new model tier (e.g. Claude 4.8) lands -- a contributor could update one site and miss the other. Centralise the gating in `AnthropicConfig._validate_effort_for_model`, which returns an error message string or `None`. Each call site keeps its own provider-appropriate exception type (`BadRequestError` for the chat path, `AnthropicError` for the /v1/messages pass-through) but the gating decision now comes from one place. Net -11 LOC. Adds a parametrised unit test exercising the helper directly across 4.5 / 4.6 / 4.7 model families and `max` / `xhigh` / lower-effort inputs. Existing tests at both call sites continue to pass unchanged. Addresses Greptile finding on PR #27074. * fix(databricks): narrow reasoning_effort_value to str for mypy `non_default_params.get("reasoning_effort")` returns `Any | None`, but `REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get()` expects `str`. Mypy flagged this on the strict pass. Narrow with `isinstance` before the lookup; non-strings fall through to the existing `BadRequestError` below with a clean validation message, so behavior is unchanged. Fixes a regression introduced by 1a10746e95 in this PR. * feat(proxy): add health_check_reasoning_effort for model health checks Co-authored-by: Cursor <cursoragent@cursor.com> * test(image_gen): align Azure image gen fixture with body omitting model Expected JSON matches deployment-scoped Azure POST (#26316). Co-authored-by: Cursor <cursoragent@cursor.com> * test(anthropic/chat): force PR-local model_cost map via autouse fixture CI runs without LITELLM_LOCAL_MODEL_COST_MAP=True, so litellm.model_cost is loaded from main-branch JSON (default model_cost_map_url) instead of the PR's checked-out model_prices_and_context_window.json. Tests that assert per-model flags added in this PR (supports_max_reasoning_effort, supports_xhigh_reasoning_effort) therefore pass locally but fail in CI with 'AssertionError: assert False is True' on 5 cases: - test_anthropic_model_supports_effort_param_recognizes_supporting_models [anthropic.claude-mythos-preview, bedrock/.../mythos-preview, claude-opus-4-5-20251101] - test_supports_effort_level_handles_provider_prefixes [bedrock/invoke/us.anthropic.claude-sonnet-4-6-max-True, claude-sonnet-4-6-max-True] Add an autouse fixture at tests/test_litellm/llms/anthropic/chat/conftest.py that monkey-patches litellm.model_cost to the PR-local JSON for every test in this directory. The parent conftest already snapshots+restores litellm.model_cost per-function, so the mutation is contained. This is a scoped workaround. The proper fix is to set the env var globally in the test workflow once the ~10 inline self-set test files are audited; tracking that as a follow-up issue. * [Fix] Docker: Pin Wolfi And Uv To Multi-Arch Index Digests The previous pins resolved to single-platform amd64 manifests, so buildx pulled the same amd64 base for both linux/amd64 and linux/arm64 targets. The published OCI index then advertised an arm64 entry whose layers are byte-identical to amd64 -- arm64 users got an amd64 binary. Switch all three Dockerfiles to the multi-arch image-index digests: - cgr.dev/chainguard/wolfi-base (index has linux/amd64 + linux/arm64) - ghcr.io/astral-sh/uv:0.11.7 (index has linux/amd64 + linux/arm64) Resolved with `docker buildx imagetools inspect <ref>` -- that returns the index digest. `docker pull` + `docker inspect` returns the per-host platform digest, which is what slipped in last time. * [Fix] Docker: Pin Uv To Multi-Arch Index Digest In Remaining Dockerfiles Apply the same fix to the three Dockerfiles not in the release pipeline today (alpine, dev, health_check) so they stay correct if/when they're built for arm64 in the future. Wolfi pins are not present in these files; the python:3.11-alpine and python:3.13-slim digests they already use are multi-arch indexes that include arm64/v8, so only the uv pin needed swapping. * fix(xai): fold reasoning_tokens into completion_tokens to satisfy OpenAI invariant xAI's chat completions API accounts reasoning_tokens separately from completion_tokens, but rolls them into total_tokens. This breaks the OpenAI invariant total_tokens == prompt_tokens + completion_tokens that downstream consumers (including litellm's own _usage_format_tests in tests/llm_translation/base_llm_unit_tests.py:58) rely on. Live capture (grok-3-mini-beta, 2026-05-04): prompt=14, completion=10, total=336, reasoning=312 14 + 10 = 24, NOT 336. OpenAI's o1/o3 reasoning models include reasoning_tokens in completion_tokens, leaving the prompt+completion=total invariant intact. xAI deviates. This patch aligns xAI to OpenAI semantics by folding reasoning_tokens into completion_tokens after the parent OpenAI parser runs. The fold is idempotent and defensive: - Only fires when total_tokens == prompt_tokens + completion_tokens + reasoning_tokens (the documented xAI shape). Refuses to fold if the gap doesn't match, guarding against silent corruption when xAI changes accounting. - Skips if completion_tokens already covers the gap (already normalised — e.g. cost calc replays a previously-folded Usage). xai.cost_calculator.cost_per_token already added reasoning_tokens to the visible completion count for billing. Post-fold the Usage block now satisfies that invariant directly, so the cost calc would double-bill. Updated cost_per_token to detect the OpenAI-normalised shape (total == prompt + completion) and skip the reasoning add-on in that case, falling through to the legacy raw-shape behaviour for callers that bypass the transformation (e.g. proxy log replay). Tests: - Adds TestXAIReasoningTokenFolding covering: gap-explained-fold, idempotent-no-double-fold, no-reasoning-skip, gap-mismatch-skip. - Adds test_already_normalised_usage_does_not_double_count_reasoning to lock the cost-calc idempotency. - Updates 7 pre-existing cost-calc tests whose total_tokens was internally inconsistent (used the OpenAI-normalised total but kept reasoning_tokens external) to use the documented xAI raw shape total = prompt + visible completion + reasoning. Pre-existing values masked the missing-fold by accident. Verified end-to-end against the live xAI API: LITELLM_LOCAL_MODEL_COST_MAP=False (CI default) + XAI_API_KEY set + pytest tests/llm_translation/test_xai.py::TestXAIChat::test_prompt_caching -> PASSED in 18.81s (was: AssertionError on usage.total_tokens == usage.prompt_tokens + usage.completion_tokens) 20/20 tests in tests/test_litellm/llms/xai/test_xai_cost_calculator.py and 8/8 in tests/test_litellm/llms/xai/test_xai_chat_transformation.py pass. * refactor(bedrock/converse): delegate effort gating to AnthropicConfig._validate_effort_for_model Removes the duplicated max/xhigh gating logic in _validate_anthropic_adaptive_effort and the now-unused _supports_effort_level_on_bedrock helper. Per-model gating now flows through the centralized AnthropicConfig._validate_effort_for_model (whose _supports_effort_level already strips Bedrock prefixes), so the chat completion, /v1/messages, and Bedrock Converse paths can't drift when a new gated effort tier is added. * Implement normalize_nonempty_secret_str function to trim whitespace from secrets and treat empty values as unset. Update proxy_server to use this function for Grafana credentials. Enhance tests to validate the new normalization behavior. * Fix qdrant semantic cache miss metadata * chore(deps): refresh dependency locks * chore(deps): authorize pytest license * fix: preserve tokenizer decode round trips * refactor(anthropic): drive adaptive-thinking gate via supports_adaptive_thinking flag Three of greptile's open comments on #27074 (P2 converse:512, P1 databricks:361, and the underlying capability-flag policy rule) flagged the same pattern: _is_claude_4_6_model(...) or _is_claude_4_7_model(...) used inline as a runtime 'is this an adaptive-thinking model?' check. That requires a code release each time a new adaptive Claude lands. Consolidate the inline gating to AnthropicModelInfo._is_adaptive_thinking_model, and switch the helper itself to read a new supports_adaptive_thinking flag from `model_prices_and_context_window.json` via `_supports_factory`, falling back to the family pattern only when the model-map entry doesn't carry the flag (preserves OpenRouter / Vercel / Bedrock-prefixed variants that route through the same code path with non-canonical ids). Adds `supports_adaptive_thinking: true` to the four 4.6/4.7 anthropic entries (opus-4-6 + dated, opus-4-7 + dated, sonnet-4-6). Bedrock-prefixed and Vertex-prefixed entries don't need the flag because both fall back through the family pattern (the helper short-circuits early on True from either path) and the bedrock/vertex Claude IDs all match the existing opus-4-{6,7} / sonnet-4-{6,7} pattern. Affected call sites: - `bedrock/chat/converse_transformation.py:_handle_reasoning_effort_parameter` - `anthropic/chat/transformation.py:_map_reasoning_effort` - `anthropic/chat/transformation.py:map_openai_params` (output_config branch) - `databricks/chat/transformation.py:map_openai_params` (output_config branch) The remaining `_is_claude_4_6_model` / `_is_claude_4_7_model` references in `AnthropicConfig._validate_effort_for_model` and `AnthropicConfig.get_supported_openai_params` are intentionally retained: they're per-model gating fallbacks for variants whose model-map entries don't yet carry the `supports_max_reasoning_effort` / `supports_reasoning` flag. Those are documented in-place. Tests: 537 anthropic/bedrock/databricks/vertex/messages tests pass. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * chore(deps): address dependency review notes * test(model_prices): add supports_adaptive_thinking to schema `test_aaamodel_prices_and_context_window_json_is_valid` validates the model-map JSON against an explicit schema with `additionalProperties`, so the new `supports_adaptive_thinking` flag added in 98ced0ae43 needs a matching schema entry. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * refactor: remove unnecessary comments from #27074 Strip out the explanatory and historical comments that don't carry business-logic justification. Comments that simply narrate what code does — or that explain prior behavior, what was changed, or which PR introduced a fix — are removed. Docstrings are reduced to a one-line summary where the long form repeated information already evident from the code or test data. No code-behavior changes. All 643 affected unit tests still pass. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test: keep decode token test local * chore(deps): align dashboard node engine * feat: selectively apply routing strategy according to model name * style: make _model_supports_effort_param more concise * refactor(anthropic,bedrock): hoist drop_params output_config warning to module constant Three call sites (anthropic chat, bedrock converse, bedrock invoke messages) emitted the same '...Effort is only supported on Opus 4.5+, Sonnet 4.6+, and Mythos Preview' warning verbatim. Extract DROP_UNSUPPORTED_OUTPUT_CONFIG_WARNING in litellm/llms/anthropic/chat/transformation.py and import it from the bedrock sites so future copy edits live in one place. Addresses Michael's review on PR #27074. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * refactor(anthropic,bedrock,databricks): factor BadRequestError for unknown reasoning_effort Three call sites raised the same BadRequestError("Invalid reasoning_effort: ... Must be one of 'minimal', 'low', ...") block when REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT returned None: anthropic chat map_openai_params, bedrock converse _handle_reasoning_effort_parameter, and databricks chat reasoning_effort path. Extract AnthropicConfig._raise_invalid_reasoning_effort(model, value, llm_provider) so future copy edits / valid-set changes happen in one place. Typed as NoReturn so type-checkers correctly narrow control flow at call sites. Addresses Michael's review on PR #27074. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * Clean up Redis semantic cache isolation fallback * fix(guardrails): align banned_keywords + azure_content_safety call_type gates with runtime route_type The hooks gated on ``call_type == "completion"`` but the proxy ingress passes ``route_type`` straight through as ``call_type`` — ``"acompletion"`` for /v1/chat/completions and ``"aresponses"`` for /v1/responses. Tests passed because they used the literal sync ``"completion"`` value, masking the gap. Switch both hooks to ``is_text_content_call_type`` (matches the canonical runtime values: completion / acompletion / aresponses) and update existing tests to assert against runtime values, plus parametrize a regression test that pins the gate. * fix: remove unused import * Add semantic cache legacy migration flag * Treat 0 team_member_budget as no cap * chore(caching): annotate qdrant quantization_params dict type Mypy infers the dict's value type from the first branch (Dict[str, bool]) which clashes with the scalar branch's mixed-type inner dict. Explicit Dict[str, Any] annotation lifts the inference. * chore(caching): remove allow_legacy_unscoped_cache_hits opt-in The flag was an opt-in escape hatch for the cross-tenant leak the rest of the patch closes — flipping it on (env var or constructor param) re-enables exactly the VERIA-54 primitive on either backend. There is no operational need that the secure path doesn't already meet: - Qdrant: legacy points without ``litellm_cache_key`` payload are excluded by the must-clause filter and treated as misses; new sets populate the cache key, so cold-start lasts only as long as the natural cache rebuild. - Redis: existing unscoped index can't carry the new schema; the init path falls back to ``{name}_isolated`` (and recreates it on stale schema), leaving the legacy index untouched. Drop the constructor param, env-var fallback, ``_using_legacy_unscoped_index`` flag, the legacy-reuse branch in ``_init_semantic_cache``, and the matching guards in set/get paths. Update tests to drop the legacy-mode cases and assert the secure-only behaviour. * fix(container): keep ownership-filter exceptions out of the LLM-error path filter_container_list_response runs after the upstream call has already succeeded; treating an ownership-lookup failure as an LLM-API error fires post_call_failure_hook for a successful upstream call and returns a misleading provider-shaped error to the client. Run the filter outside the try/except so genuine LLM errors stay scoped to the upstream call. * chore(container,skills): LRU eviction for owner caches; widen file_purpose Literal Two cleanups from the /simplify pass: * ``_CONTAINER_OWNER_CACHE`` and ``_SKILL_CACHE`` now LRU-evict via ``OrderedDict.popitem(last=False)`` instead of full ``clear()`` at capacity. Full clears converted a steady-state cached workload into a periodic full-DB-load oscillation as the cache repopulated from zero and cleared again. Reads now ``move_to_end`` so the just-touched entry survives the next eviction. Mirrors the pre-existing LRU pattern in ``_remember_container_owner``. * ``LiteLLM_ManagedObjectTable.file_purpose`` Literal now includes ``"container"`` so Pydantic validation accepts rows written by the ownership store. * chore(container,skills): drop legacy-access opt-out env vars LITELLM_ALLOW_UNTRACKED_CONTAINER_ACCESS and LITELLM_ALLOW_UNOWNED_SKILL_ACCESS were operator-toggleable opt-outs for the cross-tenant access primitive this PR closes — flipping either on re-enabled exactly the VERIA-20 read path. Default-secure with no escape hatch matches sibling fixes (vector-store cred isolation, semantic cache key isolation, user_config strip): all rejected the opt-out-of-security pattern. Untracked containers and unowned skills (rows that pre-date this enforcement) are admin-only. Non-admin owners need to either re-create via the now-tracked flow or have an admin assign ``created_by`` on the existing row. Update tests to assert the strict-only behaviour. * fix(ownership): reject identity-less callers instead of sharing a sentinel scope UNSCOPED_RESOURCE_OWNER_SCOPE collapsed every caller without an identity field (no user_id / team_id / org_id / api_key / token) into a single shared owner — a cross-tenant access primitive: any two such callers could see and delete each other's containers and skills. Drop the sentinel. ``get_primary_resource_owner_scope`` returns ``None`` and ``get_resource_owner_scopes`` returns ``[]`` for identity-less callers. ``record_container_owner`` and ``LiteLLMSkillsHandler.create_skill`` now reject creates from identity-less callers with a 403 instead of stamping the placeholder. Read paths already deny ``owner is None`` correctly so legacy rows (if any) are admin-only. * fix(proxy): include request-blocked callback params in auth bans * fix: keep skills handler FastAPI-free; fold gcs deny list into the body bouncer Two cleanups: * ``LiteLLMSkillsHandler.create_skill`` raised ``HTTPException`` for identity-less callers, importing FastAPI from a ``litellm/llms/`` module — that violates the project rule that FastAPI lives only under ``proxy/``. Switch to ``ValueError`` (the same shape the rest of the handler uses for not-found/forbidden) and update the test. * The proxy-auth body bouncer derived its observability ban list from ``_supported_callback_params`` only, missing ``_request_blocked_callback_params`` (where ``gcs_bucket_name`` and ``gcs_path_service_account`` live). Two recently-merged sibling PRs (#27019 added the deny list, #27081 added the test asserting these are rejected at the request body root) crossed without folding them together. Union the GCS deny list into the bouncer's derivation so the single source of truth covers both code paths. * fix(proxy): normalize managed resource team owner field * chore: simplify ownership tracking — drop thin stores, in-memory fallback, hand-rolled cache Substantial reduction (~765 LOC) without changing the security boundary: * Drop ContainerOwnershipStore and LiteLLMSkillsStore — both were one-method-per-Prisma-call wrappers. Inline the calls instead, matching the established pattern in vector_store_endpoints, agent_endpoints, and mcp_server/db.py. * Drop the prisma_client is None in-memory fallback. Production deploys always have Prisma; running ownership-critical paths on a process-local dict is a security footgun in the dev-mode case it was meant to support, and complicates every code path with a branch. Fail-secure: skip recording if Prisma is unavailable, and treat reads as "not found" (admin-only). * Drop the hand-rolled module-level cache. Replace with the existing litellm.caching.in_memory_cache.InMemoryCache, which already has TTL + max-size + eviction tested in its own module. Sentinel string for negative caching since InMemoryCache can't disambiguate "miss" from "cached as None". * Tests: drop coverage for removed code paths (in-memory fallback, hand-rolled cache internals). Keep tests for actual behavior (cache hit-rate, negative caching, owner check, list filtering, identity-less reject, admin bypass). * fix(container): cache list-allow-set, track admin-created containers Address Greptile P2 follow-ups from the prior round: * Cache ``_get_allowed_container_ids`` (60s LRU/TTL keyed by sorted owner-scope tuple) so ``GET /v1/containers`` doesn't issue a fresh ``find_many`` against ``litellm_managedobjecttable`` on every list call. Invalidate the caller's own cache entry when they record a new owner so the just-created container shows up on their next list. * Tighten the admin early-return in ``record_container_owner`` to skip ONLY when there's literally no container ID to stamp. An admin with identity (the master-key path populates ``user_id`` + ``api_key``) flows through the normal record path so admin-created containers are tracked like any other caller's. The truly-identity-less admin case still falls through to the 403 below — correct fail-secure default. Skill-cache invalidation gap (also flagged by Greptile) is moot: there is no skill update endpoint exposed; ownership-affecting mutations are only delete (already invalidates) and create (new ID, no cache entry to update). * chore(container): use delete_cache, json-encode scope key, clean test /simplify follow-ups: * Replace the two-``pop`` reach into ``cache_dict``/``ttl_dict`` with the existing public ``InMemoryCache.delete_cache(key)`` — the same idiom used elsewhere in the proxy. Bonus: ``delete_cache`` calls ``_remove_key`` which also handles ``expiration_heap`` consistency the direct pops were silently leaking. * JSON-encode the sorted scope list for the cache key instead of ``"|".join``. ``user_id`` / ``team_id`` / ``org_id`` / ``api_key`` are free-form strings and could contain a literal ``|`` — JSON quoting escapes any in-string separator unambiguously. * Extract ``_allowed_container_ids_cache_key()`` so the read and invalidation sites compute the key the same way. * Fix a placeholder-then-overwrite test construction: the ``__module__.split(".")[0] and "proxy_admin"`` line evaluated to a literal string that was immediately overwritten with the real enum value. Hoist the import and construct directly. * [Fix] Tests: Replace deprecated openrouter/claude-3.7-sonnet with claude-sonnet-4.5 OpenRouter has dropped active endpoints for anthropic/claude-3.7-sonnet, causing test_reasoning_content_completion to fail with a 404 "No endpoints found" error. Switch to anthropic/claude-sonnet-4.5, which is current and supports reasoning streaming. * feat: routing groups ui * fix(security): prevent secret_fields from leaking into spend logs secret_fields (containing raw HTTP headers including Authorization Bearer tokens) was being included in proxy_server_request['body'] because the body snapshot was a copy.copy(data) of the full request dict. This body gets serialized and persisted in the LiteLLM_SpendLogs table, exposing user credentials in the database. Root cause: data['secret_fields'] was set before the body snapshot at data['proxy_server_request']['body'] = copy.copy(data), so the full raw headers (including auth tokens) ended up in the snapshot. Fix (defense in depth): 1. Exclude 'secret_fields' when creating the body snapshot in litellm_pre_call_utils.py (primary fix) 2. Strip 'secret_fields' in _sanitize_request_body_for_spend_logs_payload as a secondary safeguard secret_fields remains available on the live data dict for legitimate downstream consumers (MCP, Responses API). Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com> * chore: update Next.js build artifacts (2026-05-05 02:13 UTC, node v20.20.2) * [Fix] Proxy: Break managed-resources import cycle on Python 3.13 The Python 3.13 CCI smoke matrix surfaces a partially-initialized-module ImportError when loading the managed files hook chain: litellm.proxy.hooks/__init__ (mid-import) -> enterprise.enterprise_hooks -> litellm_enterprise.proxy.hooks.managed_files -> litellm.llms.base_llm.managed_resources.isolation -> litellm.proxy.management_endpoints.common_utils -> litellm.proxy.utils (re-enters litellm.proxy.hooks) The except ImportError block in hooks/__init__.py silently swallowed the failure, leaving managed_files unregistered and POST /files returning 500 "Managed files hook not found". Two-layer fix: - Inline the 3-line _user_has_admin_view check in isolation.py instead of importing it from litellm.proxy.management_endpoints.common_utils. litellm.llms.* should not depend on litellm.proxy.* — removing this layering violation breaks the cycle at its root. - Define PROXY_HOOKS and get_proxy_hook before the conditional enterprise import in litellm/proxy/hooks/__init__.py, so any future re-entry resolves the public names instead of hitting an ImportError on a partially-initialized module. Also fold in two unrelated CCI repairs surfaced in the same staging run: - tests/otel_tests/test_key_logging_callbacks.py: per-key gcs_bucket_name / gcs_path_service_account are now stripped by initialize_dynamic_callback_params, so the GCS client falls through to the env-only branch. Update the assertion to match the new "GCS_BUCKET_NAME is not set" message. - .circleci/config.yml: tests/pass_through_tests now resolves google-auth-library@10.x via the @google-cloud/vertexai 1.12.0 bump, which uses dynamic ESM imports Jest 29 cannot load without --experimental-vm-modules. Pass that flag in the Vertex JS test step. Adds tests/test_litellm/proxy/hooks/test_proxy_hooks_init.py as a regression guard: managed_files / managed_vector_stores must register, and isolation.py must not transitively import litellm.proxy.utils. * [Fix] Proxy: Address Greptile feedback on hook-cycle PR - Move _user_has_admin_view to litellm.proxy._types as user_api_key_has_admin_view (single source of truth). common_utils.py and isolation.py both import from there now, removing the duplicated role-check that could silently diverge if new admin roles are added. - Add pytest.importorskip("litellm_enterprise") to the two regression tests that assert managed_files / managed_vector_stores are registered; those keys come from ENTERPRISE_PROXY_HOOKS so the tests would fail unconditionally in a checkout without the enterprise extra installed. * [Fix] Lint: Mark _user_has_admin_view re-export in common_utils Ruff F401 flagged the aliased import as unused within common_utils.py because the name is consumed only by external modules (~15 callers across guardrails, spend tracking, MCP, agents, management endpoints). Add `# noqa: F401 re-exported` so the alias survives lint while keeping a single source of truth in litellm.proxy._types. * refactor(azure): move image gen JSON helper; rename image edit finalize hook - Add image_generation/http_utils.azure_deployment_image_generation_json_body; call from azure.py (keeps AzureChatCompletion focused on chat). - Rename finalize_image_edit_multipart_data to finalize_image_edit_request_data with docstring covering multipart and JSON POST payloads (review feedback). Co-authored-by: Cursor <cursoragent@cursor.com> * test(proxy): cover health_check_reasoning_effort for completion mode Co-authored-by: Cursor <cursoragent@cursor.com> * [Fix] Tests: Use master key for /otel-spans in test_chat_completion_check_otel_spans /otel-spans now requires proxy admin (returns 401 'Only proxy admin can be used to generate, delete, update info for new keys/users/teams. Route=/otel-spans' for non-admin callers). Switch the GET call to use the master key sk-1234 while keeping the generated key for the chat-completion request that produces the spans. * [Fix] Tests: Pick chat-completion OTEL trace by content, not recency The /otel-spans endpoint returns process-wide spans and tags most_recent_parent by max start_time. After tightening that route to proxy_admin (sk-1234), the GET /otel-spans request itself emits auth spans that beat the chat-completion spans on start_time, so most_recent_parent now points at the request's own auth trace (['postgres', 'postgres']) and the >=5-span assertion fails. Pick the chat-completion trace by content: it is the only trace whose span list is a superset of {postgres, redis, raw_gen_ai_request, batch_write_to_db}. Verified locally end-to-end against otel_test_config.yaml + OTEL_EXPORTER=in_memory: 3/3 runs green. * [Fix] CI: Enable VCR replay for test_azure_o_series The Azure o-series tests were excluded from the conftest's VCR auto-marker because of a respx/vcrpy transport-patching conflict, but the only respx reference in the file was an unused `MockRouter` import. Drop the dead import and remove the file from the conflict set so cassettes record on first run and replay thereafter, eliminating the 60-95s live Azure latency that was crashing xdist workers under --timeout=120 thread-mode timeouts. * [Fix] Tests: Restore /metrics access for prometheus test suite /metrics now requires auth by default; tests/otel_tests/test_prometheus.py makes 4+ unauthenticated GETs against http://0.0.0.0:4000/metrics, so every prometheus test in CI now fails the metric assertion. Set require_auth_for_metrics_endpoint: false in otel_test_config.yaml to opt out for this test job, which scrapes /metrics directly. Verified locally: 8/8 prometheus tests green (one flaky retry on test_proxy_success_metrics that pre-dates this PR). Also drop the -x stop-on-first-failure flag from the otel test command so all failures in the job surface in a single CI run rather than hiding behind whichever one trips first. * [Perf] CI: Skip Redundant Playwright Apt Install in E2E UI Job The cimg/python:3.12-browsers base image already ships every Chromium system dependency Playwright needs (libnss3, libatk-bridge2.0-0, libcups2, etc. — the install log shows them all as "already the newest version"). Passing --with-deps to `npx playwright install` therefore runs an apt-get update + install for nothing, but pays the full cost of hitting Ubuntu mirrors. On a recent run those mirrors stalled hard: apt-get update alone took 6m53s at 81.5 kB/s with several archives returning connection refused. Drop --with-deps and persist ~/.cache/ms-playwright alongside node_modules so the Chromium binary is also reused across runs. Bump the cache key to v2 so the existing v1 entry (which only contained node_modules) is not loaded and skipped over the new browser path. * [Fix] Docker: Remove Hardcoded Prisma Binary Target For Multi-Arch Builds PRISMA_CLI_BINARY_TARGETS="debian-openssl-3.0.x" was hardcoded in docker/Dockerfile.non_root by #17695. On a buildx linux/arm64 leg this forces prisma to download the amd64 schema-engine into an arm64 image, so 'prisma migrate deploy' fails at startup with 'Could not find schema-engine binary'. Removing the env lets prisma auto-detect per build platform: amd64 builds still resolve to debian-openssl-3.0.x (Wolfi falls back to debian, same binary as before), and arm64 builds now correctly fetch linux-arm64-openssl-3.0.x. The offline-cache pre-warm goal of #17695 is preserved — only which binaries fill the cache changes. Fixes #19458 * [Fix] UI: Clear Admin Session Cookies Before Establishing Invited User's Session (#27227) The invite-signup form was writing the new user's token via raw `document.cookie` at `path=/`, while the rest of the auth surface uses `storeLoginToken` (which writes at `path=/ui` and mirrors to sessionStorage). After signup the inviter's `path=/ui` cookie kept winning path-specificity matching, and sessionStorage still held the inviter's token, so the dashboard rendered as the inviter rather than the newly created user. Treat invite signup as a principal-change boundary — clear prior session cookies first, then store the new token via the canonical helper. * test: add 24hr Redis-backed VCR cache to additional test suites (#27159) * test: add 24hr Redis-backed VCR cache to additional test suites Extracts the existing llm_translation VCR plumbing into a reusable helper (tests/_vcr_conftest_common.py) and wires it into the conftest.py files of the test directories listed in LIT-2787: audio_tests, batches_tests, guardrails_tests, image_gen_tests, litellm_utils_tests, local_testing, logging_callback_tests, pass_through_unit_tests, router_unit_tests, unified_google_tests The same helper is also adopted by the pre-existing llm_translation and llm_responses_api_testing conftests to remove the copy-pasted VCR setup. Each consuming conftest: - registers the Redis persister via pytest_recording_configure - auto-marks collected tests with pytest.mark.vcr (skipping respx-using files where applicable, since respx and vcrpy both patch httpx) - gates cassette writes on test success via _vcr_outcome_gate The cache is opt-in via CASSETTE_REDIS_URL; when unset, VCR is disabled and tests hit live providers as before. LITELLM_VCR_DISABLE=1 still forces a bypass for ad-hoc local runs. Test directories that run LiteLLM proxy in Docker (build_and_test, proxy_logging_guardrails_model_info_tests, proxy_store_model_in_db_tests) are intentionally not included: VCR.py patches the in-process httpx transport and cannot intercept calls made from inside a Docker container. The installing_litellm_on_python* jobs make no LLM calls and don't benefit from caching. https://linear.app/litellm-ai/issue/LIT-2787/add-24hr-caching-to-additional-test-suites * test(vcr): add safe-body matcher to handle JSONL and binary request bodies vcrpy's stock body matcher inspects Content-Type and unconditionally runs json.loads on application/json bodies. JSON Lines payloads (used by the Bedrock batch S3 PUT and other upload paths) crash that with json.JSONDecodeError: Extra data, before the matcher can return 'not a match'. This was the root cause of the batches_testing CI job failing on test_async_create_file once VCR auto-marking was applied to the batches_tests directory. Add a conservative byte-equality body matcher and use it in place of 'body' in the shared match_on tuple. The matcher is strictly more conservative than vcrpy's default — the only thing it gives up is 'different JSON key order is treated as the same body', which doesn't apply to deterministic litellm-built request payloads. It can never produce a false positive that the default would have rejected, so there is no cross-contamination risk. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): exclude tests that VCR replay actively breaks A few tests are incompatible with cassette replay and were failing on the latest CI run after VCR auto-marking was extended to local_testing and logging_callback_tests: - test_amazing_s3_logs.py (logging_callback_tests): the test asserts on a per-run response_id that should round-trip through a real S3 PUT/LIST. vcrpy's boto3 stub intercepts the PUT and the LIST replays stale keys, so the freshly-generated id is never found. - test_async_embedding_azure (logging_callback_tests) and test_amazing_sync_embedding (local_testing): the failure branches deliberately pass api_key='my-bad-key' to assert that the failure callback fires. We scrub auth headers from cassettes (so the bad-key request matches the prior good-key request), and vcrpy replays the recorded 200 — the failure callback never fires. - test_assistants.py (local_testing): the OpenAI Assistants polling APIs mint fresh thread/run IDs every recording session and then poll until status=='completed'. Replays of those polled GETs can never match a freshly-generated run id, so every CI run effectively re-records and the suite blows past the 15m no_output_timeout. Skip these from VCR auto-marking so they continue to hit live providers as they did before this change. The remaining tests in each directory still get cached. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): expand skip lists for second batch of incompatible tests Followup to the previous commit. After re-running CI on the rebuilt branch, three more tests surfaced as VCR-replay-incompatible: - litellm_utils_testing :: test_get_valid_models_from_dynamic_api_key Calls GET /v1/models with api_key='123' to assert the result is empty. We scrub auth headers, so the bad-key request matches the prior good-key cassette and replays the recorded model list. - litellm_utils_testing :: test_litellm_overhead.py Measures litellm_overhead_time_ms as a percentage of total wall-clock time. With cached responses the upstream 'network' time collapses to microseconds, blowing past the 40%% threshold the test asserts on. Skip the whole file (every parametrization is at risk). - local_testing_part1 :: test_async_custom_handler_completion and test_async_custom_handler_embedding Same bad-key failure-callback pattern as the already-skipped test_amazing_sync_embedding. - litellm_router_testing :: test_router_caching.py Asserts on litellm's own router-level response cache by comparing response1.id to response2.id across repeat upstream calls (test bypasses litellm cache via ttl=0 and expects upstream to return a *new* id). With VCR replay both upstream calls return the same cassette body, so the ids are identical. Skip the whole file. - logging_callback_tests :: test_async_chat_azure (preemptive) Same shape as already-skipped test_async_embedding_azure; was masked by upstream OpenAI rate-limit failures on baseline. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): use item.path and tighten matcher docstring - Replace pytest's deprecated item.fspath with item.path in apply_vcr_auto_marker_to_items so we don't emit deprecation warnings under pytest 8. - Clarify _safe_body_matcher docstring to reflect actual behavior (direct == first, then UTF-8 bytes comparison, no repr fallback). Addresses Greptile review feedback on PR #27159. * test(vcr): swallow all RedisError on cassette save/load Cassette persistence is strictly best-effort: any Redis-side failure (connection blip, timeout, OutOfMemoryError when the maxmemory cap is hit, READONLY replicas, etc.) should degrade to 'test passed but cassette not cached' rather than fail the test on teardown. Previously the persister only caught ConnectionError and TimeoutError, so OutOfMemoryError — which Redis Cloud raises when the cassette cache hits its memory cap and there are no evictable keys — propagated out of vcrpy's autouse fixture and ERRORed otherwise-passing tests on teardown. This caused the litellm_utils_testing CircleCI job to fail on the latest commit's run, even though the underlying test was a unit test that used mock_response and produced no real upstream traffic (the cassette was dirtied by a background langfuse callback). The rerun only succeeded because Redis evictions happened to free enough room before the SET — i.e. it was timing-dependent flakiness. Catch redis.exceptions.RedisError (the common base of all server- and client-side Redis exceptions) on both save and load, and parametrize the regression tests across ConnectionError, TimeoutError, and OutOfMemoryError to pin the new behavior. * test(vcr): surface cassette-cache failures with warnings + session banner When the persister silently swallows a Redis OOM (or any RedisError) on save/load there is otherwise no visible signal that the cache is degraded — tests pass, the cassette just isn't persisted, and the next session still hits the same Redis at the same near-cap memory. Add three layers of observability so that failure mode is loud: 1. Per-process health counters ("save_failures", "load_failures", and the last error string for each), exposed via cassette_cache_health() and reset via reset_cassette_cache_health(). The persister increments these in addition to logging. 2. VCRCassetteCacheWarning (UserWarning subclass) emitted via warnings.warn() inside the persister's except block. Pytest's built-in warnings summary at session end automatically lists every such warning, so the failure is visible in CI logs without any conftest-level wiring. 3. Session-end banner via emit_cassette_cache_session_banner() and a stderr-fallback atexit handler registered from register_persister_if_enabled(). Two states: - red "VCR CASSETTE CACHE DEGRADED" when save_failures or load_failures > 0 - yellow "VCR CASSETTE CACHE NEAR CAPACITY" (no failures, but used_memory >= 85% of maxmemory) so the next session knows the Redis is approaching OOM before any SET actually fails Capacity comes from a best-effort INFO memory probe (cassette_cache_capacity_snapshot) that returns None on any failure or when maxmemory is uncapped. The atexit handler skips xdist workers so only the controller emits. Tests: parametrize the existing save/load swallow-error tests across ConnectionError/TimeoutError/OutOfMemoryError, add direct tests for the health counters and warning emission, and a new test_vcr_conftest_common_banner.py covering banner output for every state (silent/red/yellow/disabled/xdist-worker). * test(vcr): bucket cassettes by API key fingerprint, drop bad-key skips Tests that deliberately call an LLM API with a bad key (e.g. to assert that the failure callback fires, or that check_valid_key returns False) were being silently served the prior good-key cassette: we scrub the real Authorization / x-api-key header from the cassette before storing it, so a follow-up bad-key call is byte-identical to the good-key call under the existing match_on tuple. Add a 'key_fingerprint' custom matcher that distinguishes requests by the SHA-256 of their API-key headers. The fingerprint is stamped into a synthetic 'x-litellm-key-fp' header by a new before_record_request hook, which then strips the real auth headers (we have to do the scrubbing here instead of via vcrpy's filter_headers knob, because filter_headers runs *first* and would erase the value we want to hash). Bad-key requests now get a different cassette bucket than good-key requests, so vcrpy will not replay a recorded 200 in place of the expected 401. The fingerprint is a one-way hash of the secret, so cassettes never contain the key. This permanently removes the 'bad-key' category of skips: - tests/local_testing: dropped ::test_amazing_sync_embedding, ::test_async_custom_handler_completion, ::test_async_custom_handler_embedding - tests/logging_callback_tests: dropped ::test_async_chat_azure, ::test_async_embedding_azure - tests/litellm_utils_tests: dropped ::test_get_valid_models_from_dynamic_api_key Coverage: 7 new unit tests in tests/test_litellm/test_vcr_safe_body_matcher.py covering header stripping, fingerprint determinism, no-auth bucketing, good-vs-bad key discrimination, x-api-key (Anthropic/Azure) discrimination, and idempotence under replay. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): drop redundant comments and docstrings Trim narration of code that is already self-evident from function and variable names. Keep the two genuinely non-obvious bits: - ordering constraint between filter_headers and before_record_request, which would invite a maintainer to re-introduce the bug if removed - the per-directory _VCR_INCOMPATIBLE_FILES rationale, since 'why exactly is this skipped' is not knowable from the test name alone Also drop the 40-line commented-out drop-in conftest snippet at the bottom of _vcr_conftest_common.py — the consuming conftests are the canonical reference. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): make _before_record_request idempotent vcrpy invokes before_record_request more than once per request: can_play_response_for calls it, then __contains__ / _responses (reached via play_response) call it again on the result. The second invocation sees a request whose auth headers we already stripped, so a naive recompute yields "no-key" and overwrites the real fingerprint stored in the header. This makes can_play_response_for and play_response disagree on matchability — the former says "yes, we have a stored response for this" (matching no-key to no-key) and the latter throws UnhandledHTTPRequestError because it computes a fresh real fingerprint that doesn't match the stored no-key. In CI this manifested as ~30 failing tests across guardrails_testing, audio_testing, batches_testing, image_gen_testing, llm_responses_api, litellm_router_unit_testing, etc. Skip the recompute when the header is already set, so re-applying the hook is a no-op. Adds a regression test that fires the hook twice on the same dict and asserts the fingerprint stays put. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(vcr): drop more redundant docstrings and headers * test(vcr): enable 24hr cache for ocr_tests and search_tests These two directories were the only non-dockerized test suites in the build_and_test workflow that make live LLM/provider API calls but were not VCR-enabled by this PR. Together they account for 96 tests: - tests/ocr_tests/ (31): Mistral OCR, Azure AI OCR, Azure Document Intelligence, Vertex AI OCR. Pure-unit tests inside the same files (e.g. TestAzureDocumentIntelligencePagesParam) make no HTTP calls and become benign VCR NOOPs. - tests/search_tests/ (65): Brave, DataForSEO, DuckDuckGo, Exa, Firecrawl, Google PSE, Linkup, Parallel.ai, Perplexity, SearchAPI, Searxng, Serper, Tavily. Both directories use the canonical minimal conftest pattern from tests/audio_tests/conftest.py with no skip lists. None of the test files use respx, none assert on per-call upstream non-determinism (no response1.id != response2.id, no overhead-as-fraction-of-total, no live polling), so the default match_on tuple should cache cleanly. If a flake surfaces during the first cassette-recording CI run, we can add a targeted skip the same way we did for the other dirs. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * [Fix] Team UI: handle legacy dict shape for metadata.guardrails (#27224) * [Fix] Team UI: handle legacy dict shape for metadata.guardrails A team can have metadata.guardrails stored as {"modify_guardrails": bool} (the permission-flag shape introduced in PR #4810) rather than the expected string[]. The opt-out logic added in PR #25575 calls .filter() on this field, which throws TypeError on a dict and crashes the team detail page. Add a safeGuardrailsList helper that returns [] when the field is not an array, and route the three read sites through it. * [Fix] Team UI: inline Array.isArray guards for guardrails metadata Replace the safeGuardrailsList helper with inline Array.isArray checks at each call site, and apply the same guard to opted_out_global_guardrails for consistency. No known legacy dict rows for opted_out_global_guardrails, but the unguarded `|| []` pattern is the same shape risk. Six call sites now defended directly: three for metadata.guardrails and three for metadata.opted_out_global_guardrails. * chore: update Next.js build artifacts (2026-05-05 22:45 UTC, node v20.20.2) (#27240) * [Infra] Bump deps (#27157) * bump: version 0.4.70 → 0.4.71 * bump: version 0.1.39 → 0.1.40 * uv lock --------- Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: shivam <shivam@berri.ai> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> Co-authored-by: Michael-RZ-Berri <michael@berri.ai> Co-authored-by: harish-berri <harish@berri.ai> Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> Co-authored-by: Michael Riad Zaky <michaelr@Michaels-MacBook-Air.local> Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
2026-05-05 16:15:03 -07:00 · 2026-05-05 16:15:03 -07:00 · 6ff668c7aa
commit 6ff668c7aa
parent 934ecdca78
1092 changed files with 54847 additions and 9580 deletions
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@ -1475,7 +1475,7 @@ jobs:
      - run:
          name: Run tests
          command: |
-            uv run --no-sync python -m pytest -v tests/otel_tests -x --junitxml=test-results/junit.xml --durations=5
+            uv run --no-sync python -m pytest -v tests/otel_tests --junitxml=test-results/junit.xml --durations=5
          no_output_timeout: 15m
            # Clean up first container
      - run:
@ -1935,7 +1935,7 @@ jobs:
          name: Run Vertex AI, Google AI Studio Node.js tests
          command: |
            cd tests/pass_through_tests
-            npx jest . --verbose
+            NODE_OPTIONS=--experimental-vm-modules npx jest . --verbose
          no_output_timeout: 30m
      - run:
          name: Run tests
@ -2138,17 +2138,23 @@ jobs:
            - ~/.cache/uv
      - restore_cache:
          keys:
-            - ui-e2e-node-deps-v1-{{ checksum "ui/litellm-dashboard/package-lock.json" }}
+            - ui-e2e-node-deps-v2-{{ checksum "ui/litellm-dashboard/package-lock.json" }}
      - run:
          name: Install Node dependencies and Playwright
+          # The cimg/python:3.12-browsers image already ships the Chromium system
+          # libraries Playwright needs (libnss3, libatk-bridge2.0-0, libcups2, etc.).
+          # `--with-deps` triggers a redundant apt-get update + install that adds
+          # 5-10 minutes to the job and frequently stalls on flaky Ubuntu mirrors,
+          # so we install just the browser binary.
          command: |
            cd ui/litellm-dashboard
            npm ci
-            npx playwright install chromium --with-deps
+            npx playwright install chromium
      - save_cache:
-          key: ui-e2e-node-deps-v1-{{ checksum "ui/litellm-dashboard/package-lock.json" }}
+          key: ui-e2e-node-deps-v2-{{ checksum "ui/litellm-dashboard/package-lock.json" }}
          paths:
            - ui/litellm-dashboard/node_modules
+            - ~/.cache/ms-playwright
      - run:
          name: Build UI from source
          # Prior version used `cp -r out/ ../../litellm/proxy/_experimental/out/`.
--- a/.github/workflows/check-lazy-openapi-snapshot.yml
+++ b/.github/workflows/check-lazy-openapi-snapshot.yml
@ -1,75 +0,0 @@
-name: Check Lazy OpenAPI Snapshot
-
-on:
-  pull_request:
-    branches:
-      - main
-      - litellm_internal_staging
-      - "litellm_**"
-
-permissions:
-  contents: read
-  checks: write
-
-concurrency:
-  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
-  cancel-in-progress: true
-
-jobs:
-  verify:
-    runs-on: ubuntu-latest
-    timeout-minutes: 10
-    steps:
-      - uses: actions/checkout@08eba0b27e820071cde6df949e0beb9ba4906955 # v4.3.0
-        with:
-          persist-credentials: false
-
-      - name: Set up Python
-        uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
-        with:
-          python-version: "3.12"
-
-      - name: Set up uv
-        uses: astral-sh/setup-uv@37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7
-        with:
-          version: "0.10.9"
-
-      - name: Cache uv dependencies
-        uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
-        with:
-          path: |
-            ~/.cache/uv
-            .venv
-          key: ${{ runner.os }}-uv-${{ hashFiles('uv.lock') }}
-          restore-keys: |
-            ${{ runner.os }}-uv-
-
-      - name: Install dependencies
-        run: uv sync --frozen --all-groups --all-extras
-
-      - name: Regenerate snapshot to /tmp
-        id: regen
-        run: |
-          cp litellm/proxy/_lazy_openapi_snapshot.json /tmp/snapshot.committed.json
-          uv run --no-sync python -m litellm.proxy._lazy_openapi_snapshot
-          mv litellm/proxy/_lazy_openapi_snapshot.json /tmp/snapshot.fresh.json
-          mv /tmp/snapshot.committed.json litellm/proxy/_lazy_openapi_snapshot.json
-
-      - name: Compare
-        id: diff
-        continue-on-error: true
-        run: |
-          diff -q /tmp/snapshot.fresh.json litellm/proxy/_lazy_openapi_snapshot.json
-
-      - name: Mark neutral if drift
-        if: steps.diff.outcome == 'failure'
-        uses: LouisBrunner/checks-action@6b626ffbad7cc56fd58627f774b9067e6118af23 # v2.0.0
-        with:
-          token: ${{ secrets.GITHUB_TOKEN }}
-          name: lazy-openapi-snapshot
-          conclusion: neutral
-          output: |
-            {
-              "title": "Lazy openapi snapshot is stale",
-              "summary": "Run `python -m litellm.proxy._lazy_openapi_snapshot` and commit the regenerated `litellm/proxy/_lazy_openapi_snapshot.json`. Not blocking — the snapshot will regenerate at release if not committed."
-            }
--- a/.github/workflows/create-release.yml
+++ b/.github/workflows/create-release.yml
@ -4,7 +4,7 @@ on:
  workflow_dispatch:
    inputs:
      tag:
-        description: "Release tag (e.g. 1.84.0, 1.84.0rc1, 1.84.0.dev42, 1.84.0.post1; legacy v1.83.10-stable still accepted)"
+        description: "Release tag (e.g. 1.84.0, 1.84.0rc1, 1.84.0.dev42, 1.84.0-dev.2, 1.84.0.post1; legacy v1.83.10-stable still accepted)"
        required: true
        type: string
      commit_hash:
@ -46,9 +46,11 @@ jobs:
            const commitHash = process.env.COMMIT_HASH;

            // Mark RC / dev / nightly / alpha / beta tags as GitHub pre-releases.
+            // Accept both PEP 440 (`.dev`) and SemVer (`-dev`) separators so tags
+            // like `1.84.0.dev2` and `1.84.0-dev.2` are both detected.
            // PEP 440 post-releases (e.g. `1.84.0.post1`) and legacy `-stable[.patch.N]`
            // are stable maintenance releases, not pre-releases.
-            const isPrerelease = /(?:rc|nightly|alpha|beta|\.dev)/i.test(tag);
+            const isPrerelease = /(?:rc|nightly|alpha|beta|[-.]dev)/i.test(tag);

            const cosignSection = [
              `## Verify Docker Image Signature`,
--- a/.gitignore
+++ b/.gitignore
@ -90,7 +90,6 @@ test.py
 litellm_config.yaml
 !.github/observatory/litellm_config.yaml
 .cursor
-.vscode/launch.json
 litellm/proxy/to_delete_loadtest_work/*
 update_model_cost_map.py
 tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py
@ -100,4 +99,5 @@ STABILIZATION_TODO.md
 **/test-results
 **/playwright-report
 **/*.storageState.json
-**/coverage
+**/coverage
+test-config
--- a/6
+++ b/6
@ -1,9 +1,9 @@
 # Base image for building
-ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:f26d42a15d09d9a643b231df929fa3cf609bedc58a728eb445be89a9d8d1da9f
+ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31

 # Runtime image
-ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:f26d42a15d09d9a643b231df929fa3cf609bedc58a728eb445be89a9d8d1da9f
-ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:733b4042187702f832f7fdecb3aff14a61b288c4ca37af188bb5715c1caebaf8
+ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
+ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:240fb85ab0f263ef12f492d8476aa3a2e4e1e333f7d67fbdd923d00a506a516a

 FROM $UV_IMAGE AS uvbin

--- a/3
+++ b/3
@ -185,3 +185,6 @@ test-llm-translation-single: install-test-deps
 	$(UV_RUN) pytest tests/llm_translation/$(FILE) \
 		--junitxml=test-results/junit.xml \
 		-v --tb=short --maxfail=100 --timeout=300
+
+test-llm-translation-flush-vcr-cache:
+	$(UV_RUN) python tests/_flush_vcr_cache.py
--- a/README.md
+++ b/README.md
@ -68,7 +68,7 @@ Managing LLM calls across providers gets complicated fast — different SDKs, au
    <td><img height="60" alt="Stripe" src="https://github.com/user-attachments/assets/f7296d4f-9fbd-460d-9d05-e4df31697c4b" /></td>
    <td><img height="60" alt="image" src="https://github.com/user-attachments/assets/436fca71-988b-40bb-b5fe-8450c80fdbd0" /></td>
    <td><img height="60" alt="Google ADK" src="https://github.com/user-attachments/assets/caf270a2-5aee-45c4-8222-41a2070c4f19" /></td>
-    <td><img height="60" alt="Greptile" src="https://github.com/user-attachments/assets/0be4bd8a-7cfa-48d3-9090-f415fe948280" /></td>
+    <td><img height="60" alt="Greptile" src="https://github.com/user-attachments/assets/3db0ae72-0843-4005-a56d-bba1dde2193d" /></td>
    <td><img height="60" alt="OpenHands" src="https://github.com/user-attachments/assets/a6150c4c-149e-4cae-888b-8b92be6e003f" /></td>
    <td><h2>Netflix</h2></td>
    <td><img height="60" alt="OpenAI Agents SDK" src="https://github.com/user-attachments/assets/c02f7be0-8c2e-4d27-aea7-7c024bfaebc0" /></td>
--- a/cookbook/litellm-ollama-docker-image/requirements.txt
+++ b/cookbook/litellm-ollama-docker-image/requirements.txt
@ -1 +1 @@
-litellm==1.83.5
+litellm==1.83.14
--- a/docker/Dockerfile.alpine
+++ b/docker/Dockerfile.alpine
@ -3,7 +3,7 @@ ARG LITELLM_BUILD_IMAGE=python:3.11-alpine@sha256:f07e2ace46f560f09a6eeec7b4913b

 # Runtime image
 ARG LITELLM_RUNTIME_IMAGE=python:3.11-alpine@sha256:f07e2ace46f560f09a6eeec7b4913b80ee99546e749ef82342a419a326620856
-ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:733b4042187702f832f7fdecb3aff14a61b288c4ca37af188bb5715c1caebaf8
+ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:240fb85ab0f263ef12f492d8476aa3a2e4e1e333f7d67fbdd923d00a506a516a

 FROM $UV_IMAGE AS uvbin

--- a/docker/Dockerfile.database
+++ b/docker/Dockerfile.database
@ -1,9 +1,9 @@
 # Base image for building
-ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:f26d42a15d09d9a643b231df929fa3cf609bedc58a728eb445be89a9d8d1da9f
+ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31

 # Runtime image
-ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:f26d42a15d09d9a643b231df929fa3cf609bedc58a728eb445be89a9d8d1da9f
-ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:733b4042187702f832f7fdecb3aff14a61b288c4ca37af188bb5715c1caebaf8
+ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
+ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:240fb85ab0f263ef12f492d8476aa3a2e4e1e333f7d67fbdd923d00a506a516a

 FROM $UV_IMAGE AS uvbin

--- a/docker/Dockerfile.dev
+++ b/docker/Dockerfile.dev
@ -3,7 +3,7 @@ ARG LITELLM_BUILD_IMAGE=python:3.13-slim@sha256:739e7213785e88c0f702dcdc12c0973a

 # Runtime image
 ARG LITELLM_RUNTIME_IMAGE=python:3.13-slim@sha256:739e7213785e88c0f702dcdc12c0973afcbd606dbf021a589cab77d6b00b579d
-ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:733b4042187702f832f7fdecb3aff14a61b288c4ca37af188bb5715c1caebaf8
+ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:240fb85ab0f263ef12f492d8476aa3a2e4e1e333f7d67fbdd923d00a506a516a

 FROM $UV_IMAGE AS uvbin

--- a/docker/Dockerfile.health_check
+++ b/docker/Dockerfile.health_check
@ -1,4 +1,4 @@
-ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:733b4042187702f832f7fdecb3aff14a61b288c4ca37af188bb5715c1caebaf8
+ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:240fb85ab0f263ef12f492d8476aa3a2e4e1e333f7d67fbdd923d00a506a516a
 FROM $UV_IMAGE AS uvbin

 FROM python:3.13-slim@sha256:739e7213785e88c0f702dcdc12c0973afcbd606dbf021a589cab77d6b00b579d
--- a/docker/Dockerfile.non_root
+++ b/docker/Dockerfile.non_root
@ -1,8 +1,8 @@
 # Base images
-ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:f26d42a15d09d9a643b231df929fa3cf609bedc58a728eb445be89a9d8d1da9f
-ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:f26d42a15d09d9a643b231df929fa3cf609bedc58a728eb445be89a9d8d1da9f
+ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
+ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:3258be472764337fd13095bcbb3182da170243b5819fd67ad4c0754590588b31
 ARG PROXY_EXTRAS_SOURCE=published
-ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:733b4042187702f832f7fdecb3aff14a61b288c4ca37af188bb5715c1caebaf8
+ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:240fb85ab0f263ef12f492d8476aa3a2e4e1e333f7d67fbdd923d00a506a516a

 FROM $UV_IMAGE AS uvbin

@ -32,7 +32,6 @@ ENV UV_PROJECT_ENVIRONMENT=/app/.venv \
    PATH="/app/.venv/bin:${PATH}" \
    LITELLM_NON_ROOT=true \
    PRISMA_BINARY_CACHE_DIR=/app/.cache/prisma-python/binaries \
-    PRISMA_CLI_BINARY_TARGETS="debian-openssl-3.0.x" \
    XDG_CACHE_HOME=/app/.cache

 # Copy dependency metadata first for layer caching
@ -114,7 +113,6 @@ COPY --from=builder /app/docker/supervisord.conf /etc/supervisord.conf

 ENV PATH="/app/.venv/bin:${PATH}" \
    PRISMA_BINARY_CACHE_DIR=/app/.cache/prisma-python/binaries \
-    PRISMA_CLI_BINARY_TARGETS="debian-openssl-3.0.x" \
    HOME=/app \
    LITELLM_NON_ROOT=true \
    XDG_CACHE_HOME=/app/.cache \
--- a/docs/my-website/docs/providers/crusoe.md
+++ b/docs/my-website/docs/providers/crusoe.md
@ -0,0 +1,196 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# Crusoe
+
+## Overview
+
+| Property | Details |
+|-------|-------|
+| Description | Crusoe Cloud provides GPU-accelerated inference for open-source large language models, optimized for performance and cost efficiency. |
+| Provider Route on LiteLLM | `crusoe/` |
+| Link to Provider Doc | [Crusoe Managed Inference Documentation ↗](https://docs.crusoecloud.com/managed-inference/overview/index.html) |
+| Base URL | `https://managed-inference-api-proxy.crusoecloud.com/v1` |
+| Supported Operations | [`/chat/completions`](#sample-usage) |
+
+<br />
+<br />
+
+**We support ALL Crusoe models, just set `crusoe/` as a prefix when sending completion requests**
+
+## Available Models
+
+| Model | Description | Context Window |
+|-------|-------------|----------------|
+| `crusoe/deepseek-ai/DeepSeek-R1-0528` | DeepSeek R1 reasoning model (May 2025) | 163,840 tokens |
+| `crusoe/deepseek-ai/DeepSeek-V3-0324` | DeepSeek V3 chat model (March 2025) | 163,840 tokens |
+| `crusoe/google/gemma-3-12b-it` | Google Gemma 3 12B instruction-tuned | 131,072 tokens |
+| `crusoe/meta-llama/Llama-3.3-70B-Instruct` | Llama 3.3 70B instruction-tuned | 131,072 tokens |
+| `crusoe/moonshotai/Kimi-K2-Thinking` | Kimi K2 extended thinking model | 262,144 tokens |
+| `crusoe/openai/gpt-oss-120b` | OpenAI 120B open-source model | 131,072 tokens |
+| `crusoe/Qwen/Qwen3-235B-A22B-Instruct-2507` | Qwen3 235B MoE instruction-tuned | 262,144 tokens |
+
+## Required Variables
+
+```python showLineNumbers title="Environment Variables"
+os.environ["CRUSOE_API_KEY"] = ""  # your Crusoe API key
+```
+
+## Usage - LiteLLM Python SDK
+
+### Non-streaming
+
+```python showLineNumbers title="Crusoe Non-streaming Completion"
+import os
+import litellm
+from litellm import completion
+
+os.environ["CRUSOE_API_KEY"] = ""  # your Crusoe API key
+
+messages = [{"content": "Hello, how are you?", "role": "user"}]
+
+# Crusoe call
+response = completion(
+    model="crusoe/meta-llama/Llama-3.3-70B-Instruct",
+    messages=messages
+)
+
+print(response)
+```
+
+### Streaming
+
+```python showLineNumbers title="Crusoe Streaming Completion"
+import os
+import litellm
+from litellm import completion
+
+os.environ["CRUSOE_API_KEY"] = ""  # your Crusoe API key
+
+messages = [{"content": "Write a short story about AI", "role": "user"}]
+
+# Crusoe call with streaming
+response = completion(
+    model="crusoe/meta-llama/Llama-3.3-70B-Instruct",
+    messages=messages,
+    stream=True
+)
+
+for chunk in response:
+    print(chunk)
+```
+
+### Function Calling
+
+```python showLineNumbers title="Crusoe Function Calling"
+import os
+import litellm
+from litellm import completion
+
+os.environ["CRUSOE_API_KEY"] = ""  # your Crusoe API key
+
+tools = [{
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "description": "Get the current weather in a location",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "location": {
+                    "type": "string",
+                    "description": "The city and state, e.g. San Francisco, CA"
+                }
+            },
+            "required": ["location"]
+        }
+    }
+}]
+
+messages = [{"role": "user", "content": "What's the weather in Boston?"}]
+
+response = completion(
+    model="crusoe/meta-llama/Llama-3.3-70B-Instruct",
+    messages=messages,
+    tools=tools,
+    tool_choice="auto"
+)
+
+print(response)
+```
+
+## Usage - LiteLLM Proxy Server
+
+```yaml showLineNumbers title="config.yaml"
+model_list:
+  - model_name: llama-3.3-70b
+    litellm_params:
+      model: crusoe/meta-llama/Llama-3.3-70B-Instruct
+      api_key: os.environ/CRUSOE_API_KEY
+  - model_name: deepseek-r1
+    litellm_params:
+      model: crusoe/deepseek-ai/DeepSeek-R1-0528
+      api_key: os.environ/CRUSOE_API_KEY
+  - model_name: deepseek-v3
+    litellm_params:
+      model: crusoe/deepseek-ai/DeepSeek-V3-0324
+      api_key: os.environ/CRUSOE_API_KEY
+  - model_name: qwen3-235b
+    litellm_params:
+      model: crusoe/Qwen/Qwen3-235B-A22B-Instruct-2507
+      api_key: os.environ/CRUSOE_API_KEY
+  - model_name: kimi-k2
+    litellm_params:
+      model: crusoe/moonshotai/Kimi-K2-Thinking
+      api_key: os.environ/CRUSOE_API_KEY
+```
+
+## Custom API Base
+
+**Option 1: Environment variable**
+
+```python showLineNumbers title="Custom API Base via env var"
+import os
+from litellm import completion
+
+os.environ["CRUSOE_API_BASE"] = "https://custom.crusoecloud.com/v1"
+os.environ["CRUSOE_API_KEY"] = ""  # your API key
+
+response = completion(
+    model="crusoe/meta-llama/Llama-3.3-70B-Instruct",
+    messages=[{"content": "Hello!", "role": "user"}],
+)
+```
+
+**Option 2: Pass directly**
+
+```python showLineNumbers title="Custom API Base via parameter"
+from litellm import completion
+
+response = completion(
+    model="crusoe/meta-llama/Llama-3.3-70B-Instruct",
+    messages=[{"content": "Hello!", "role": "user"}],
+    api_base="https://custom.crusoecloud.com/v1",
+    api_key="your-api-key",
+)
+```
+
+## Supported OpenAI Parameters
+
+- `temperature`
+- `max_tokens`
+- `max_completion_tokens`
+- `top_p`
+- `frequency_penalty`
+- `presence_penalty`
+- `stop`
+- `n`
+- `stream`
+- `tools`
+- `tool_choice`
+- `response_format`
+- `seed`
+- `user`
+- `logit_bias`
+- `logprobs`
+- `top_logprobs`
--- a/enterprise/enterprise_hooks/banned_keywords.py
+++ b/enterprise/enterprise_hooks/banned_keywords.py
@ -11,6 +11,10 @@ from typing import Literal
 import litellm
 from litellm.caching.caching import DualCache
 from litellm.proxy._types import UserAPIKeyAuth
+from litellm.proxy.guardrails._content_utils import (
+    is_text_content_call_type,
+    iter_message_text,
+)
 from litellm.integrations.custom_logger import CustomLogger
 from litellm._logging import verbose_proxy_logger
 from fastapi import HTTPException
@ -73,10 +77,9 @@ class _ENTERPRISE_BannedKeywords(CustomLogger):
            - check if user id part of blocked list
            """
            self.print_verbose("Inside Banned Keyword List Pre-Call Hook")
-            if call_type == "completion" and "messages" in data:
-                for m in data["messages"]:
-                    if "content" in m and isinstance(m["content"], str):
-                        self.test_violation(test_str=m["content"])
+            if is_text_content_call_type(call_type):
+                for text in iter_message_text(data):
+                    self.test_violation(test_str=text)

        except HTTPException as e:
            raise e
@ -93,11 +96,16 @@ class _ENTERPRISE_BannedKeywords(CustomLogger):
        user_api_key_dict: UserAPIKeyAuth,
        response,
    ):
-        if isinstance(response, litellm.ModelResponse) and isinstance(
-            response.choices[0], litellm.utils.Choices
-        ):
-            for word in self.banned_keywords_list:
-                self.test_violation(test_str=response.choices[0].message.content or "")
+        if not isinstance(response, litellm.ModelResponse):
+            return
+
+        for choice in response.choices:
+            if not isinstance(choice, litellm.utils.Choices):
+                continue
+            message = getattr(choice, "message", None)
+            content = getattr(message, "content", None)
+            if isinstance(content, str):
+                self.test_violation(test_str=content)

    async def async_post_call_streaming_hook(
        self,
--- a/enterprise/enterprise_hooks/google_text_moderation.py
+++ b/enterprise/enterprise_hooks/google_text_moderation.py
@ -12,6 +12,7 @@ import litellm
 from litellm._logging import verbose_proxy_logger
 from litellm.integrations.custom_logger import CustomLogger
 from litellm.proxy._types import UserAPIKeyAuth
+from litellm.proxy.guardrails._content_utils import iter_message_text
 from litellm.types.utils import CallTypesLiteral


@ -94,11 +95,9 @@ class _ENTERPRISE_GoogleTextModeration(CustomLogger):
        - Calls Google's Text Moderation API
        - Rejects request if it fails safety check
        """
-        if "messages" in data and isinstance(data["messages"], list):
-            text = ""
-            for m in data["messages"]:  # assume messages is a list
-                if "content" in m and isinstance(m["content"], str):
-                    text += m["content"]
+        # Covers multimodal list content + Responses-API input.
+        text = "".join(iter_message_text(data))
+        if text:
            document = self.language_document(content=text, type_=self.document_type)

            request = self.moderate_text_request(
--- a/enterprise/enterprise_hooks/openai_moderation.py
+++ b/enterprise/enterprise_hooks/openai_moderation.py
@ -19,6 +19,7 @@ import litellm
 from litellm._logging import verbose_proxy_logger
 from litellm.integrations.custom_logger import CustomLogger
 from litellm.proxy._types import UserAPIKeyAuth
+from litellm.proxy.guardrails._content_utils import iter_message_text
 from litellm.types.utils import CallTypesLiteral


@ -37,11 +38,8 @@ class _ENTERPRISE_OpenAI_Moderation(CustomLogger):
        user_api_key_dict: UserAPIKeyAuth,
        call_type: CallTypesLiteral,
    ):
-        text = ""
-        if "messages" in data and isinstance(data["messages"], list):
-            for m in data["messages"]:  # assume messages is a list
-                if "content" in m and isinstance(m["content"], str):
-                    text += m["content"]
+        # Covers multimodal list content + Responses-API input.
+        text = "".join(iter_message_text(data))

        from litellm.proxy.proxy_server import llm_router

--- a/enterprise/litellm_enterprise/enterprise_callbacks/secret_detection.py
+++ b/enterprise/litellm_enterprise/enterprise_callbacks/secret_detection.py
@ -18,6 +18,7 @@ from litellm._logging import verbose_proxy_logger
 from litellm.caching.caching import DualCache
 from litellm.integrations.custom_guardrail import CustomGuardrail
 from litellm.proxy._types import UserAPIKeyAuth
+from litellm.proxy.guardrails._content_utils import walk_user_text

 GUARDRAIL_NAME = "hide_secrets"

@ -473,23 +474,19 @@ class _ENTERPRISE_SecretDetection(CustomGuardrail):
        if await self.should_run_check(user_api_key_dict) is False:
            return

-        if "messages" in data and isinstance(data["messages"], list):
-            for message in data["messages"]:
-                if "content" in message and isinstance(message["content"], str):
-                    detected_secrets = self.scan_message_for_secrets(message["content"])
+        # Covers multimodal list content + Responses-API input.
+        def _redact_message_text(text: str) -> str:
+            detected_secrets = self.scan_message_for_secrets(text)
+            for secret in detected_secrets:
+                text = text.replace(secret["value"], "[REDACTED]")
+            if detected_secrets:
+                secret_types = [secret["type"] for secret in detected_secrets]
+                verbose_proxy_logger.warning(
+                    f"Detected and redacted secrets in message: {secret_types}"
+                )
+            return text

-                    for secret in detected_secrets:
-                        message["content"] = message["content"].replace(
-                            secret["value"], "[REDACTED]"
-                        )
-
-                    if len(detected_secrets) > 0:
-                        secret_types = [secret["type"] for secret in detected_secrets]
-                        verbose_proxy_logger.warning(
-                            f"Detected and redacted secrets in message: {secret_types}"
-                        )
-                    else:
-                        verbose_proxy_logger.debug("No secrets detected on input.")
+        walk_user_text(data, _redact_message_text)

        if "prompt" in data:
            if isinstance(data["prompt"], str):
@ -504,11 +501,15 @@ class _ENTERPRISE_SecretDetection(CustomGuardrail):
                        f"Detected and redacted secrets in prompt: {secret_types}"
                    )
            elif isinstance(data["prompt"], list):
-                for item in data["prompt"]:
+                # Index back into the list — assigning to ``item`` would only
+                # rebind the loop variable and leave ``data["prompt"]``
+                # carrying the unredacted secret.
+                for idx, item in enumerate(data["prompt"]):
                    if isinstance(item, str):
                        detected_secrets = self.scan_message_for_secrets(item)
                        for secret in detected_secrets:
                            item = item.replace(secret["value"], "[REDACTED]")
+                        data["prompt"][idx] = item
                        if len(detected_secrets) > 0:
                            secret_types = [
                                secret["type"] for secret in detected_secrets
@ -517,31 +518,6 @@ class _ENTERPRISE_SecretDetection(CustomGuardrail):
                                f"Detected and redacted secrets in prompt: {secret_types}"
                            )

-        if "input" in data:
-            if isinstance(data["input"], str):
-                detected_secrets = self.scan_message_for_secrets(data["input"])
-                for secret in detected_secrets:
-                    data["input"] = data["input"].replace(secret["value"], "[REDACTED]")
-                if len(detected_secrets) > 0:
-                    secret_types = [secret["type"] for secret in detected_secrets]
-                    verbose_proxy_logger.warning(
-                        f"Detected and redacted secrets in input: {secret_types}"
-                    )
-            elif isinstance(data["input"], list):
-                _input_in_request = data["input"]
-                for idx, item in enumerate(_input_in_request):
-                    if isinstance(item, str):
-                        detected_secrets = self.scan_message_for_secrets(item)
-                        for secret in detected_secrets:
-                            _input_in_request[idx] = item.replace(
-                                secret["value"], "[REDACTED]"
-                            )
-                        if len(detected_secrets) > 0:
-                            secret_types = [
-                                secret["type"] for secret in detected_secrets
-                            ]
-                            verbose_proxy_logger.warning(
-                                f"Detected and redacted secrets in input: {secret_types}"
-                            )
-                verbose_proxy_logger.debug("Data after redacting input %s", data)
+        # ``data["input"]`` (Responses API and embeddings/moderation) is
+        # already covered by ``walk_user_text`` above.
        return
--- a/enterprise/litellm_enterprise/proxy/auth/custom_sso_handler.py
+++ b/enterprise/litellm_enterprise/proxy/auth/custom_sso_handler.py
@ -10,28 +10,21 @@ has already authenticated the user) and you need to extract user information fro
 custom headers or other request attributes.
 """

-from typing import TYPE_CHECKING, Dict, Optional, Union, cast
+from typing import cast

 from fastapi import Request
 from fastapi.responses import RedirectResponse

-if TYPE_CHECKING:
-    from fastapi_sso.sso.base import OpenID
-else:
-    from typing import Any as OpenID
-
-from litellm.proxy.management_endpoints.types import CustomOpenID
-

 class EnterpriseCustomSSOHandler:
    """
    Enterprise Custom SSO Handler for LiteLLM Proxy
-    
+
    This class provides methods for handling custom SSO authentication flows
    where users can implement their own authentication logic by processing
    request headers and returning user information in OpenID format.
    """
-    
+
    @staticmethod
    async def handle_custom_ui_sso_sign_in(
        request: Request,
@ -40,16 +33,16 @@ class EnterpriseCustomSSOHandler:
        Allow a user to execute their custom code to parse incoming request headers and return a OpenID object

        Use this when you have an OAuth proxy in front of LiteLLM (where the OAuth proxy has already authenticated the user)
-        
+
        Args:
            request: The FastAPI request object containing headers and other request data
-            
+
        Returns:
            RedirectResponse: Redirect response that sends the user to the LiteLLM UI with authentication token
-            
+
        Raises:
            ValueError: If custom_ui_sso_sign_in_handler is not configured
-            
+
        Example:
            This method is typically called when a user has already been authenticated by an
            external OAuth proxy and the proxy has added custom headers containing user information.
@ -60,27 +53,44 @@ class EnterpriseCustomSSOHandler:
        from litellm.integrations.custom_sso_handler import CustomSSOLoginHandler
        from litellm.proxy.proxy_server import (
            CommonProxyErrors,
+            general_settings,
            premium_user,
            user_custom_ui_sso_sign_in_handler,
        )
+        from litellm.proxy.auth.trusted_proxy_utils import (
+            require_trusted_proxy_request,
+        )
+
        if premium_user is not True:
            raise ValueError(CommonProxyErrors.not_premium_user.value)
-        
+
        if user_custom_ui_sso_sign_in_handler is None:
-            raise ValueError("custom_ui_sso_sign_in_handler is not configured. Please set it in general_settings.")
-        
-        custom_sso_login_handler = cast(CustomSSOLoginHandler, user_custom_ui_sso_sign_in_handler)
-        openid_response: OpenID = await custom_sso_login_handler.handle_custom_ui_sso_sign_in(
+            raise ValueError(
+                "custom_ui_sso_sign_in_handler is not configured. Please set it in general_settings."
+            )
+
+        require_trusted_proxy_request(
            request=request,
+            general_settings=general_settings,
+            feature_name="Custom UI SSO",
        )
-        
+
+        custom_sso_login_handler = cast(
+            CustomSSOLoginHandler, user_custom_ui_sso_sign_in_handler
+        )
+        openid_response: OpenID = (
+            await custom_sso_login_handler.handle_custom_ui_sso_sign_in(
+                request=request,
+            )
+        )
+
        # Import here to avoid circular imports
        from litellm.proxy.management_endpoints.ui_sso import SSOAuthenticationHandler
-        
+
        return await SSOAuthenticationHandler.get_redirect_response_from_openid(
            result=openid_response,
            request=request,
            received_response=None,
            generic_client_id=None,
            ui_access_mode=None,
-        ) 
+        )
--- a/enterprise/litellm_enterprise/proxy/hooks/managed_files.py
+++ b/enterprise/litellm_enterprise/proxy/hooks/managed_files.py
@ -15,6 +15,11 @@ from litellm.caching.caching import DualCache
 from litellm.integrations.custom_logger import CustomLogger
 from litellm.litellm_core_utils.prompt_templates.common_utils import extract_file_data
 from litellm.llms.base_llm.files.transformation import BaseFileEndpoints
+from litellm.llms.base_llm.managed_resources.isolation import (
+    build_list_page,
+    build_owner_filter,
+    can_access_resource,
+)
 from litellm.proxy._types import (
    CallTypes,
    LiteLLM_ManagedFileTable,
@ -99,6 +104,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                model_mappings=model_mappings,
                flat_model_file_ids=list(model_mappings.values()),
                created_by=user_api_key_dict.user_id,
+                team_id=user_api_key_dict.team_id,
                updated_by=user_api_key_dict.user_id,
            )
            await self.internal_usage_cache.async_set_cache(
@ -114,6 +120,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
            "model_mappings": json.dumps(model_mappings),
            "flat_model_file_ids": list(model_mappings.values()),
            "created_by": user_api_key_dict.user_id,
+            "team_id": user_api_key_dict.team_id,
            "updated_by": user_api_key_dict.user_id,
        }

@ -125,7 +132,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                db_data["storage_backend"] = hidden_params["storage_backend"]
            if "storage_url" in hidden_params:
                db_data["storage_url"] = hidden_params["storage_url"]
-            
+
            verbose_logger.debug(
                f"Storage metadata: storage_backend={db_data.get('storage_backend')}, "
                f"storage_url={db_data.get('storage_url')}"
@ -171,6 +178,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                    "model_object_id": model_object_id,
                    "file_purpose": file_purpose,
                    "created_by": user_api_key_dict.user_id,
+                    "team_id": user_api_key_dict.team_id,
                    "updated_by": user_api_key_dict.user_id,
                    "status": file_object.status,
                },
@ -229,15 +237,16 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
    async def can_user_call_unified_file_id(
        self, unified_file_id: str, user_api_key_dict: UserAPIKeyAuth
    ) -> bool:
-        ## check if the user has access to the unified file id
-
-        user_id = user_api_key_dict.user_id
        managed_file = await self.prisma_client.db.litellm_managedfiletable.find_first(
            where={"unified_file_id": unified_file_id}
        )

        if managed_file:
-            return managed_file.created_by == user_id
+            return can_access_resource(
+                user_api_key_dict=user_api_key_dict,
+                created_by=managed_file.created_by,
+                resource_team_id=managed_file.team_id,
+            )
        raise HTTPException(
            status_code=404,
            detail=f"File not found: {unified_file_id}",
@ -246,8 +255,6 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
    async def can_user_call_unified_object_id(
        self, unified_object_id: str, user_api_key_dict: UserAPIKeyAuth
    ) -> bool:
-        ## check if the user has access to the unified object id
-        user_id = user_api_key_dict.user_id
        managed_object = (
            await self.prisma_client.db.litellm_managedobjecttable.find_first(
                where={"unified_object_id": unified_object_id}
@ -255,7 +262,11 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
        )

        if managed_object:
-            return managed_object.created_by == user_id
+            return can_access_resource(
+                user_api_key_dict=user_api_key_dict,
+                created_by=managed_object.created_by,
+                resource_team_id=managed_object.team_id,
+            )
        raise HTTPException(
            status_code=404,
            detail=f"Object not found: {unified_object_id}",
@ -285,28 +296,27 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
            raise Exception(
                "Filtering by 'target_model_names' is not supported when using managed batches."
            )
-        
-        where_clause: Dict[str, Any] = {"file_purpose": "batch"}
-        
-        # Filter by user who created the batch
-        if user_api_key_dict.user_id:
-            where_clause["created_by"] = user_api_key_dict.user_id
-        
+
+        owner_filter = build_owner_filter(user_api_key_dict)
+        if owner_filter is None:
+            return build_list_page([])
+
+        where_clause: Dict[str, Any] = {"file_purpose": "batch", **owner_filter}
+
        if after:
            where_clause["id"] = {"gt": after}
-        
-        # Fetch more than needed to allow for post-fetch filtering
+
        fetch_limit = limit or 20
        if target_model_names:
-            # Fetch extra to account for filtering
+            # Oversample so post-fetch model-name filtering still has enough rows.
            fetch_limit = max(fetch_limit * 3, 100)
-        
+
        batches = await self.prisma_client.db.litellm_managedobjecttable.find_many(
            where=where_clause,
            take=fetch_limit,
            order={"created_at": "desc"},
        )
-                
+
        batch_objects: List[LiteLLMBatch] = []
        for batch in batches:
            try:
@ -314,7 +324,11 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                if len(batch_objects) >= (limit or 20):
                    break

-                batch_data = json.loads(batch.file_object) if isinstance(batch.file_object, str) else batch.file_object
+                batch_data = (
+                    json.loads(batch.file_object)
+                    if isinstance(batch.file_object, str)
+                    else batch.file_object
+                )
                batch_obj = LiteLLMBatch(**batch_data)
                batch_obj.id = batch.unified_object_id
                batch_objects.append(batch_obj)
@ -324,27 +338,29 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                    f"Failed to parse batch object {batch.unified_object_id}: {e}"
                )
                continue
-        
-        return {
-            "object": "list",
-            "data": batch_objects,
-            "first_id": batch_objects[0].id if batch_objects else None,
-            "last_id": batch_objects[-1].id if batch_objects else None,
-            "has_more": len(batch_objects) == (limit or 20),
-        }
+
+        return build_list_page(
+            batch_objects, has_more=len(batch_objects) == (limit or 20)
+        )

    async def get_user_created_file_ids(
        self, user_api_key_dict: UserAPIKeyAuth, model_object_ids: List[str]
    ) -> List[OpenAIFileObject]:
        """
-        Get all file ids created by the user for a list of model object ids
+        Get all file ids the caller is allowed to see for a list of model
+        object ids. Service-account keys (no user_id) are scoped to their
+        team via ``team_id``; admins see all matches.

        Returns:
         - List of OpenAIFileObject's
        """
+        owner_filter = build_owner_filter(user_api_key_dict)
+        if owner_filter is None:
+            return []
+
        file_ids = await self.prisma_client.db.litellm_managedfiletable.find_many(
            where={
-                "created_by": user_api_key_dict.user_id,
+                **owner_filter,
                "flat_model_file_ids": {"hasSome": model_object_ids},
            }
        )
@ -377,11 +393,11 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
        """
        Check if the user has access to a list of file IDs.
        Only checks managed (unified) file IDs.
-        
+
        Args:
            file_ids: List of file IDs to check access for
            user_api_key_dict: User API key authentication details
-            
+
        Raises:
            HTTPException: If user doesn't have access to any of the files
        """
@ -419,10 +435,10 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
        ### HANDLE TRANSFORMATIONS ###
        # Check both completion and acompletion call types
        is_completion_call = (
-            call_type == CallTypes.completion.value 
+            call_type == CallTypes.completion.value
            or call_type == CallTypes.acompletion.value
        )
-        
+
        if is_completion_call:
            messages = data.get("messages")
            model = data.get("model", "")
@ -431,22 +447,27 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                if file_ids:
                    # Check user has access to all managed files
                    await self.check_file_ids_access(file_ids, user_api_key_dict)
-                    
+
                    # Check if any files are stored in storage backends and need base64 conversion
                    # This is needed for Vertex AI/Gemini which requires base64 content
-                    is_vertex_ai = model and ("vertex_ai" in model or "gemini" in model.lower())
+                    is_vertex_ai = model and (
+                        "vertex_ai" in model or "gemini" in model.lower()
+                    )
                    if is_vertex_ai:
                        await self._convert_storage_files_to_base64(
                            messages=messages,
                            file_ids=file_ids,
                            litellm_parent_otel_span=user_api_key_dict.parent_otel_span,
                        )
-                    
+
                    model_file_id_mapping = await self.get_model_file_id_mapping(
                        file_ids, user_api_key_dict.parent_otel_span
                    )
                    data["model_file_id_mapping"] = model_file_id_mapping
-        elif call_type == CallTypes.aresponses.value or call_type == CallTypes.responses.value:
+        elif (
+            call_type == CallTypes.aresponses.value
+            or call_type == CallTypes.responses.value
+        ):
            # Handle managed files in responses API input and tools
            file_ids = []

@ -611,7 +632,9 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
            if model_id is None:
                model_id = cast(
                    Optional[str],
-                    kwargs.get("litellm_metadata", {}).get("model_info", {}).get("id", None),
+                    kwargs.get("litellm_metadata", {})
+                    .get("model_info", {})
+                    .get("id", None),
                )
            mapped_file_id: Optional[str] = None
            if input_file_id and model_file_id_mapping and model_id:
@ -648,7 +671,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
    ) -> List[str]:
        """
        Gets file ids from responses API input.
-        
+
        The input can be:
        - A string (no files)
        - A list of input items, where each item can have:
@ -656,32 +679,35 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
          - content: a list that can contain items with type: "input_file" and file_id
        """
        file_ids: List[str] = []
-        
+
        if isinstance(input, str):
            return file_ids
-        
+
        if not isinstance(input, list):
            return file_ids
-        
+
        for item in input:
            if not isinstance(item, dict):
                continue
-            
+
            # Check for direct input_file type
            if item.get("type") == "input_file":
                file_id = item.get("file_id")
                if file_id:
                    file_ids.append(file_id)
-            
+
            # Check for input_file in content array
            content = item.get("content")
            if isinstance(content, list):
                for content_item in content:
-                    if isinstance(content_item, dict) and content_item.get("type") == "input_file":
+                    if (
+                        isinstance(content_item, dict)
+                        and content_item.get("type") == "input_file"
+                    ):
                        file_id = content_item.get("file_id")
                        if file_id:
                            file_ids.append(file_id)
-        
+
        return file_ids

    def get_file_ids_from_responses_tools(
@ -689,7 +715,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
    ) -> List[str]:
        """
        Gets file ids from responses API tools parameter.
-        
+
        The tools can contain code_interpreter with container.file_ids:
        [
            {
@ -699,14 +725,14 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
        ]
        """
        file_ids: List[str] = []
-        
+
        if not isinstance(tools, list):
            return file_ids
-        
+
        for tool in tools:
            if not isinstance(tool, dict):
                continue
-            
+
            # Check for code_interpreter with container file_ids
            if tool.get("type") == "code_interpreter":
                container = tool.get("container")
@ -716,7 +742,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                        for file_id in container_file_ids:
                            if isinstance(file_id, str):
                                file_ids.append(file_id)
-        
+
        return file_ids

    def get_vector_store_ids_from_file_search_tools(
@ -916,10 +942,17 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
        # Emit Prometheus metrics for managed file creation
        prom_logger = self._get_prometheus_logger()
        if prom_logger:
-            first_model = target_model_names_list[0] if target_model_names_list else None
+            first_model = (
+                target_model_names_list[0] if target_model_names_list else None
+            )
            first_provider = ""
            if responses:
-                first_provider = getattr(responses[0], "_hidden_params", {}).get("custom_llm_provider") or ""
+                first_provider = (
+                    getattr(responses[0], "_hidden_params", {}).get(
+                        "custom_llm_provider"
+                    )
+                    or ""
+                )
            prom_logger.record_managed_file_created(
                model=first_model or "",
                api_provider=first_provider,
@ -1073,16 +1106,24 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                            model_name=resolved_model_name,
                        )
                        setattr(response, file_attr, unified_file_id)
-                        
+
                        # Use llm_router credentials when available. Without credentials,
                        # Azure and other auth-required providers return 500/401.
                        file_object = None
                        try:
                            # Import module and use getattr for better testability with mocks
                            import litellm.proxy.proxy_server as proxy_server_module
-                            _llm_router = getattr(proxy_server_module, 'llm_router', None)
+
+                            _llm_router = getattr(
+                                proxy_server_module, "llm_router", None
+                            )
                            if _llm_router is not None and model_id:
-                                _creds = _llm_router.get_deployment_credentials_with_provider(model_id) or {}
+                                _creds = (
+                                    _llm_router.get_deployment_credentials_with_provider(
+                                        model_id
+                                    )
+                                    or {}
+                                )
                                file_object = await litellm.afile_retrieve(
                                    file_id=original_file_id,
                                    **_creds,
@ -1099,7 +1140,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                            verbose_logger.warning(
                                f"Failed to retrieve file object for {file_attr}={original_file_id}: {str(e)}. Storing with None and will fetch on-demand."
                            )
-                        
+
                        await self.store_unified_file_id(
                            file_id=unified_file_id,
                            file_object=file_object,
@ -1128,6 +1169,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                            from litellm.litellm_core_utils.get_llm_provider_logic import (
                                get_llm_provider,
                            )
+
                            _, batch_provider, _, _ = get_llm_provider(model=model_name)
                        except Exception:
                            if "/" in model_name:
@ -1199,7 +1241,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
        # Case 1 : This is not a managed file
        if not stored_file_object:
            raise Exception(f"LiteLLM Managed File object with id={file_id} not found")
-        
+
        # Case 2: Managed file and the file object exists in the database
        # The stored file_object has the raw provider ID. Replace with the unified ID
        # so callers see a consistent ID (matching Case 3 which does response.id = file_id).
@ -1217,13 +1259,21 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
            )

        try:
-            model_id, model_file_id = next(iter(stored_file_object.model_mappings.items()))
-            credentials = llm_router.get_deployment_credentials_with_provider(model_id) or {}
-            response = await litellm.afile_retrieve(file_id=model_file_id, **credentials)
+            model_id, model_file_id = next(
+                iter(stored_file_object.model_mappings.items())
+            )
+            credentials = (
+                llm_router.get_deployment_credentials_with_provider(model_id) or {}
+            )
+            response = await litellm.afile_retrieve(
+                file_id=model_file_id, **credentials
+            )
            response.id = file_id  # Replace with unified ID
            return response
        except Exception as e:
-            raise Exception(f"Failed to retrieve file {file_id} from provider: {str(e)}") from e
+            raise Exception(
+                f"Failed to retrieve file {file_id} from provider: {str(e)}"
+            ) from e

    async def afile_list(
        self,
@ -1245,19 +1295,19 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
            import litellm.proxy.proxy_server as proxy_server_module

            # Check if the scheduler has the batch cost checking job registered
-            scheduler = getattr(proxy_server_module, 'scheduler', None)
+            scheduler = getattr(proxy_server_module, "scheduler", None)
            if scheduler is None:
                return False
-            
+
            # Check if the check_batch_cost_job exists in the scheduler
            try:
-                job = scheduler.get_job('check_batch_cost_job')
+                job = scheduler.get_job("check_batch_cost_job")
                if job is not None:
                    return True
            except Exception:
                # Job not found or scheduler doesn't support get_job
                pass
-            
+
            return False
        except Exception as e:
            verbose_logger.warning(
@ -1265,28 +1315,26 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
            )
            return False

-    async def _get_batches_referencing_file(
-        self, file_id: str
-    ) -> List[Dict[str, Any]]:
+    async def _get_batches_referencing_file(self, file_id: str) -> List[Dict[str, Any]]:
        """
        Find batches that reference this file and still need cost tracking.
        Find batches that are in non-terminal state and have not yet been processed by CheckBatchCost.
        Args:
            file_id: The unified file ID to check
-            
+
        Returns:
            List of batch objects referencing this file in non-terminal state
            (max 10 for error message display)
        """
        # Prepare list of file IDs to check (both unified and provider IDs)
        file_ids_to_check = [file_id]
-        
+
        # Get model-specific file IDs for this unified file ID if it's a managed file
        try:
            model_file_id_mapping = await self.get_model_file_id_mapping(
                [file_id], litellm_parent_otel_span=None
            )
-            
+
            if model_file_id_mapping and file_id in model_file_id_mapping:
                # Add all provider file IDs for this unified file
                provider_file_ids = list(model_file_id_mapping[file_id].values())
@ -1296,59 +1344,67 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                f"Could not get model file ID mapping for {file_id}: {e}. "
                f"Will only check unified file ID."
            )
-        MAX_MATCHES_TO_RETURN = 10 
-        
+        MAX_MATCHES_TO_RETURN = 10
+
        batches = await self.prisma_client.db.litellm_managedobjecttable.find_many(
            where={
                "file_purpose": "batch",
                "batch_processed": False,
-                "status": {"not_in": ["failed", "expired", "cancelled"]}
+                "status": {"not_in": ["failed", "expired", "cancelled"]},
            },
            take=MAX_MATCHES_TO_RETURN,
            order={"created_at": "desc"},
        )
-        
+
        referencing_batches = []
        for batch in batches:
            try:
                # Parse the batch file_object to check for file references
-                batch_data = json.loads(batch.file_object) if isinstance(batch.file_object, str) else batch.file_object
-                
+                batch_data = (
+                    json.loads(batch.file_object)
+                    if isinstance(batch.file_object, str)
+                    else batch.file_object
+                )
+
                # Extract file IDs from batch
                # Batches typically reference the unified file ID in input_file_id
                # Output and error files are generated by the provider
                input_file_id = batch_data.get("input_file_id")
                output_file_id = batch_data.get("output_file_id")
                error_file_id = batch_data.get("error_file_id")
-                
-                referenced_file_ids = [fid for fid in [input_file_id, output_file_id, error_file_id] if fid]
-                
+
+                referenced_file_ids = [
+                    fid for fid in [input_file_id, output_file_id, error_file_id] if fid
+                ]
+
                # Check if any referenced file ID matches the file we're trying to delete
                if any(ref_id in file_ids_to_check for ref_id in referenced_file_ids):
-                    referencing_batches.append({
-                        "batch_id": batch.unified_object_id,
-                        "status": batch.status,
-                        "created_at": batch.created_at,
-                    })
+                    referencing_batches.append(
+                        {
+                            "batch_id": batch.unified_object_id,
+                            "status": batch.status,
+                            "created_at": batch.created_at,
+                        }
+                    )
            except Exception as e:
                verbose_logger.warning(
                    f"Error parsing batch object {batch.unified_object_id}: {e}"
                )
                continue
-        
+
        return referencing_batches

    async def _check_file_deletion_allowed(self, file_id: str) -> None:
        """
        Check if file deletion should be blocked due to batch references.
-        
+
        Blocks deletion if:
        1. File is referenced by any batch in non-terminal state, AND
        2. Batch polling is configured (user wants cost tracking)
-        
+
        Args:
            file_id: The unified file ID to check
-            
+
        Raises:
            HTTPException: If file deletion should be blocked
        """
@ -1356,39 +1412,45 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
        if not self._is_batch_polling_enabled():
            # Batch polling not configured, allow deletion
            return
-        
+
        # Check if file is referenced by any non-terminal batches
        referencing_batches = await self._get_batches_referencing_file(file_id)
-        
+
        if referencing_batches:
            # File is referenced by non-terminal batches and polling is enabled
-            MAX_BATCHES_IN_ERROR = 5  # Limit batches shown in error message for readability
-            
+            MAX_BATCHES_IN_ERROR = (
+                5  # Limit batches shown in error message for readability
+            )
+
            # Show up to MAX_BATCHES_IN_ERROR in the error message
            batches_to_show = referencing_batches[:MAX_BATCHES_IN_ERROR]
-            batch_statuses = [f"{b['batch_id']}: {b['status']}" for b in batches_to_show]
-            
+            batch_statuses = [
+                f"{b['batch_id']}: {b['status']}" for b in batches_to_show
+            ]
+
            # Determine the count message
            count_message = f"{len(referencing_batches)}"
-            if len(referencing_batches) >= 10:  # MAX_MATCHES_TO_RETURN from _get_batches_referencing_file
+            if (
+                len(referencing_batches) >= 10
+            ):  # MAX_MATCHES_TO_RETURN from _get_batches_referencing_file
                count_message = "10+"
-            
+
            error_message = (
                f"Cannot delete file {file_id}. "
                f"The file is referenced by {count_message} batch(es) in non-terminal state"
            )
-            
+
            # Add specific batch details if not too many
            if len(referencing_batches) <= MAX_BATCHES_IN_ERROR:
                error_message += f": {', '.join(batch_statuses)}. "
            else:
                error_message += f" (showing {MAX_BATCHES_IN_ERROR} most recent): {', '.join(batch_statuses)}. "
-            
+
            error_message += (
                f"To delete this file before complete cost tracking, please delete or cancel the referencing batch(es) first. "
                f"Alternatively, wait for all batches to complete and for cost to be computed (batch_processed=true)."
            )
-            
+
            # Record blocked deletion metric
            prom_logger = self._get_prometheus_logger()
            if prom_logger:
@ -1419,7 +1481,9 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
        specific_model_file_id_mapping = model_file_id_mapping.get(file_id)
        if specific_model_file_id_mapping:
            # Remove conflicting keys from data to avoid duplicate keyword arguments
-            filtered_data = {k: v for k, v in data.items() if k not in ("model", "file_id")}
+            filtered_data = {
+                k: v for k, v in data.items() if k not in ("model", "file_id")
+            }
            for model_id, model_file_id in specific_model_file_id_mapping.items():
                delete_response = await llm_router.afile_delete(model=model_id, file_id=model_file_id, **filtered_data)  # type: ignore

@ -1480,7 +1544,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
    ) -> None:
        """
        Convert files stored in storage backends to base64 format for Vertex AI/Gemini.
-        
+
        This method checks if any managed files are stored in storage backends,
        downloads them, and converts them to base64 format in the messages.
        """
@ -1488,29 +1552,29 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
        for file_id in file_ids:
            # Check if this is a base64 encoded unified file ID
            decoded_unified_file_id = _is_base64_encoded_unified_file_id(file_id)
-            
+
            if not decoded_unified_file_id:
                continue
-            
+
            # Check database for storage backend info
            # IMPORTANT: The database stores the base64 encoded unified_file_id (not the decoded version)
            # So we query with the original file_id (which is base64 encoded)
            db_file = await self.prisma_client.db.litellm_managedfiletable.find_first(
                where={"unified_file_id": file_id}
            )
-            
+
            if not db_file or not db_file.storage_backend or not db_file.storage_url:
                continue
-            
+
            # File is stored in a storage backend, download and convert to base64
            try:
                from litellm.llms.base_llm.files.storage_backend_factory import (
                    get_storage_backend,
                )
-                
+
                storage_backend_name = db_file.storage_backend
                storage_url = db_file.storage_url
-                
+
                # Get storage backend (uses same env vars as callback)
                try:
                    storage_backend = get_storage_backend(storage_backend_name)
@ -1519,18 +1583,22 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                        f"Storage backend '{storage_backend_name}' error for file {file_id}: {str(e)}"
                    )
                    continue
-                
+
                file_content = await storage_backend.download_file(storage_url)
-                
+
                # Determine content type from file object
-                content_type = self._get_content_type_from_file_object(db_file.file_object)
-                
+                content_type = self._get_content_type_from_file_object(
+                    db_file.file_object
+                )
+
                # Convert to base64
                base64_data = base64.b64encode(file_content).decode("utf-8")
                base64_data_uri = f"data:{content_type};base64,{base64_data}"
-                
+
                # Update messages to use base64 instead of file_id
-                self._update_messages_with_base64_data(messages, file_id, base64_data_uri, content_type)
+                self._update_messages_with_base64_data(
+                    messages, file_id, base64_data_uri, content_type
+                )
            except Exception as e:
                verbose_logger.exception(
                    f"Error converting file {file_id} from storage backend to base64: {str(e)}"
@ -1541,21 +1609,21 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
    def _get_content_type_from_file_object(self, file_object: Optional[Any]) -> str:
        """
        Determine content type from file object.
-        
+
        Uses the MIME type utility for consistent detection and normalization.
-        
+
        Args:
            file_object: The file object from the database (can be dict, JSON string, or None)
-        
+
        Returns:
            str: MIME type (defaults to "application/octet-stream" if cannot be determined)
        """
        # Use utility function for detection
        content_type = get_content_type_from_file_object(file_object)
-        
+
        # Normalize for Gemini/Vertex AI (requires image/jpeg, not image/jpg)
        content_type = normalize_mime_type_for_provider(content_type, provider="gemini")
-        
+
        return content_type

    def _update_messages_with_base64_data(
@ -1567,7 +1635,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
    ) -> None:
        """
        Update messages to replace file_id with base64 data URI.
-        
+
        Args:
            messages: List of messages to update
            file_id: The file ID to replace
@ -1582,7 +1650,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                        if element.get("type") == "file":
                            file_element = cast(ChatCompletionFileObject, element)
                            file_element_file = file_element.get("file", {})
-                            
+
                            if file_element_file.get("file_id") == file_id:
                                # Replace file_id with base64 data
                                file_element_file["file_data"] = base64_data_uri
@ -1590,7 +1658,7 @@ class _PROXY_LiteLLMManagedFiles(CustomLogger, BaseFileEndpoints):
                                file_element_file["format"] = content_type
                                # Remove file_id to ensure only file_data is used
                                file_element_file.pop("file_id", None)
-                                
+
                                verbose_logger.debug(
                                    f"Converted file {file_id} from storage backend to base64 with format {content_type}"
                                )
--- a/enterprise/litellm_enterprise/proxy/management_endpoints/project_endpoints.py
+++ b/enterprise/litellm_enterprise/proxy/management_endpoints/project_endpoints.py
@ -588,24 +588,21 @@ async def update_project(  # noqa: PLR0915
                param="project_id",
            )

-        # Validate team exists and get team object for limit + permission checks
-        team_id_to_check = data.team_id or existing_project.team_id
-        team_obj_for_checks = None
-        if team_id_to_check is not None:
-            team_obj_for_checks = await _validate_team_exists(
-                team_id=team_id_to_check, prisma_client=prisma_client
+        # Permission to *edit* the project must be evaluated against the
+        # project's CURRENT team. Sourcing the team from `data.team_id`
+        # would let an admin of any team pass the check by supplying their
+        # own team_id, hijacking the project (VERIA-55).
+        target_team_id = data.team_id or existing_project.team_id
+        target_team_obj = None
+        if target_team_id is not None:
+            target_team_obj = await _validate_team_exists(
+                team_id=target_team_id, prisma_client=prisma_client
            )

-        # Check if user has permission to update this project
        has_permission = await _check_user_permission_for_project(
            user_api_key_dict=user_api_key_dict,
            team_id=existing_project.team_id,
            prisma_client=prisma_client,
-            team_object=(
-                LiteLLM_TeamTable(**team_obj_for_checks.model_dump())
-                if team_obj_for_checks
-                else None
-            ),
        )

        if not has_permission:
@ -614,10 +611,32 @@ async def update_project(  # noqa: PLR0915
                detail={"error": "Only admins or team admins can update projects"},
            )

+        # Reassigning to a different team also requires admin rights on the
+        # destination team — otherwise a team admin could shed projects into
+        # an unsuspecting team's namespace.
+        if data.team_id is not None and data.team_id != existing_project.team_id:
+            can_assign_to_target = await _check_user_permission_for_project(
+                user_api_key_dict=user_api_key_dict,
+                team_id=data.team_id,
+                prisma_client=prisma_client,
+                team_object=(
+                    LiteLLM_TeamTable(**target_team_obj.model_dump())
+                    if target_team_obj
+                    else None
+                ),
+            )
+            if not can_assign_to_target:
+                raise HTTPException(
+                    status_code=403,
+                    detail={
+                        "error": "Cannot reassign project to a team you are not an admin of"
+                    },
+                )
+
        # Validate project limits against team limits
-        if team_obj_for_checks is not None:
+        if target_team_obj is not None:
            _check_team_project_limits(
-                team_object=LiteLLM_TeamTable(**team_obj_for_checks.model_dump()),
+                team_object=LiteLLM_TeamTable(**target_team_obj.model_dump()),
                data=data,
            )

@ -857,10 +876,16 @@ async def project_info(
                where={"team_id": project.team_id}
            )
            if team:
-                is_team_member = (
-                    user_api_key_dict.user_id in team.admins
-                    or user_api_key_dict.user_id in team.members
-                )
+                caller_user_id = user_api_key_dict.user_id
+                for m in team.members_with_roles or []:
+                    m_user_id = (
+                        m.get("user_id")
+                        if isinstance(m, dict)
+                        else getattr(m, "user_id", None)
+                    )
+                    if m_user_id == caller_user_id:
+                        is_team_member = True
+                        break

        if not (is_admin or is_team_member):
            raise HTTPException(
@ -911,20 +936,20 @@ async def list_projects(
                include={"litellm_budget_table": True, "object_permission": True}
            )
        else:
-            # Get projects for teams the user belongs to
-            user_teams = await prisma_client.db.litellm_teamtable.find_many(
-                where={
-                    "OR": [
-                        {"members": {"has": user_api_key_dict.user_id}},
-                        {"admins": {"has": user_api_key_dict.user_id}},
-                    ]
-                }
+            # Look up the user's team memberships via the reverse-index on
+            # LiteLLM_UserTable.teams (maintained by team_member_add alongside
+            # members_with_roles). This avoids a full scan of all team rows.
+            user_record = await prisma_client.db.litellm_usertable.find_unique(
+                where={"user_id": user_api_key_dict.user_id},
+            )
+            user_team_ids = (
+                user_record.teams
+                if user_record is not None and user_record.teams
+                else []
            )

-            team_ids = [team.team_id for team in user_teams]
-
            projects = await prisma_client.db.litellm_projecttable.find_many(
-                where={"team_id": {"in": team_ids}},
+                where={"team_id": {"in": user_team_ids}},
                include={"litellm_budget_table": True, "object_permission": True},
            )

--- a/enterprise/pyproject.toml
+++ b/enterprise/pyproject.toml
@ -1,6 +1,6 @@
 [project]
 name = "litellm-enterprise"
-version = "0.1.39"
+version = "0.1.40"
 description = "Package for LiteLLM Enterprise features"
 readme = "README.md"
 requires-python = ">=3.9"
@ -16,7 +16,7 @@ Repository = "https://github.com/BerriAI/litellm"
 Documentation = "https://docs.litellm.ai"

 [build-system]
-requires = ["uv_build==0.10.7"]
+requires = ["uv_build==0.11.8"]
 build-backend = "uv_build"

 [tool.uv]
@ -26,7 +26,7 @@ required-version = ">=0.10.9"
 module-root = ""

 [tool.commitizen]
-version = "0.1.39"
+version = "0.1.40"
 version_files = [
    "pyproject.toml:^version",
    "../pyproject.toml:litellm-enterprise==",
--- a/litellm-js/proxy/package-lock.json
+++ b/litellm-js/proxy/package-lock.json
--- a/litellm-js/proxy/package.json
+++ b/litellm-js/proxy/package.json
@ -4,11 +4,11 @@
    "deploy": "wrangler deploy --minify src/index.ts"
  },
  "dependencies": {
-    "hono": "4.12.12",
+    "hono": "4.12.16",
    "openai": "4.29.2"
  },
  "devDependencies": {
-    "@cloudflare/workers-types": "4.20240208.0",
-    "wrangler": "3.32.0"
+    "@cloudflare/workers-types": "4.20260501.1",
+    "wrangler": "4.87.0"
  }
 }
--- a/litellm-js/spend-logs/package-lock.json
+++ b/litellm-js/spend-logs/package-lock.json
@ -6,7 +6,7 @@
    "": {
      "dependencies": {
        "@hono/node-server": "1.19.13",
-        "hono": "4.12.12"
+        "hono": "4.12.16"
      },
      "devDependencies": {
        "@types/node": "20.19.25",
@ -548,9 +548,9 @@
      }
    },
    "node_modules/hono": {
-      "version": "4.12.12",
-      "resolved": "https://registry.npmjs.org/hono/-/hono-4.12.12.tgz",
-      "integrity": "sha512-p1JfQMKaceuCbpJKAPKVqyqviZdS0eUxH9v82oWo1kb9xjQ5wA6iP3FNVAPDFlz5/p7d45lO+BpSk1tuSZMF4Q==",
+      "version": "4.12.16",
+      "resolved": "https://registry.npmjs.org/hono/-/hono-4.12.16.tgz",
+      "integrity": "sha512-jN0ZewiNAWSe5khM3EyCmBb250+b40wWbwNILNfEvq84VREWwOIkuUsFONk/3i3nqkz7Oe1PcpM2mwQEK2L9Kg==",
      "license": "MIT",
      "engines": {
        "node": ">=16.9.0"
--- a/litellm-js/spend-logs/package.json
+++ b/litellm-js/spend-logs/package.json
@ -4,7 +4,7 @@
  },
  "dependencies": {
    "@hono/node-server": "1.19.13",
-    "hono": "4.12.12"
+    "hono": "4.12.16"
  },
  "devDependencies": {
    "@types/node": "20.19.25",
--- a/litellm-proxy-extras/litellm_proxy_extras/migrations/20260501195714_managed_resource_team_owner/migration.sql
+++ b/litellm-proxy-extras/litellm_proxy_extras/migrations/20260501195714_managed_resource_team_owner/migration.sql
@ -0,0 +1,20 @@
+-- Adds `team_id` to managed-resource tables so service-account API
+-- keys (no `user_id`) can be scoped by team instead of bypassing the
+-- `created_by` filter entirely. Existing rows keep `team_id = NULL`
+-- and become invisible to team-only callers — that is the intended isolation
+-- outcome; backfill manually if legacy rows must remain visible.
+--
+-- The composite indexes match the listing query: filter by team owner, sort by
+-- created_at DESC. Tables are typically small (resources per tenant, not per
+-- request); a future operator with a large table can switch to
+-- CREATE INDEX CONCURRENTLY in a follow-up migration.
+
+ALTER TABLE "LiteLLM_ManagedFileTable" ADD COLUMN IF NOT EXISTS "team_id" TEXT;
+ALTER TABLE "LiteLLM_ManagedObjectTable" ADD COLUMN IF NOT EXISTS "team_id" TEXT;
+ALTER TABLE "LiteLLM_ManagedVectorStoreTable" ADD COLUMN IF NOT EXISTS "team_id" TEXT;
+
+-- Index names follow Prisma's auto-generated convention so `prisma migrate diff`
+-- against the schema is clean.
+CREATE INDEX IF NOT EXISTS "LiteLLM_ManagedFileTable_team_id_created_at_idx" ON "LiteLLM_ManagedFileTable" ("team_id", "created_at" DESC);
+CREATE INDEX IF NOT EXISTS "LiteLLM_ManagedObjectTable_team_id_created_at_idx" ON "LiteLLM_ManagedObjectTable" ("team_id", "created_at" DESC);
+CREATE INDEX IF NOT EXISTS "LiteLLM_ManagedVectorStoreTable_team_id_created_at_idx" ON "LiteLLM_ManagedVectorStoreTable" ("team_id", "created_at" DESC);
--- a/litellm-proxy-extras/litellm_proxy_extras/schema.prisma
+++ b/litellm-proxy-extras/litellm_proxy_extras/schema.prisma
@ -884,28 +884,32 @@ model LiteLLM_ManagedFileTable {
  storage_backend String? // Storage backend name (e.g., "azure_storage", "gcs", "default")
  storage_url String? // The actual storage URL where the file is stored
  created_at DateTime @default(now())
-  created_by String? 
+  created_by String?
+  team_id String? // Team that owns the resource; populated for service-account keys without a user_id so listings can isolate by team.
  updated_at DateTime @updatedAt
  updated_by String?

  @@index([unified_file_id])
+  @@index([team_id, created_at(sort: Desc)])
 }

-model LiteLLM_ManagedObjectTable { // for batches or finetuning jobs which use the 
+model LiteLLM_ManagedObjectTable { // for batches or finetuning jobs which use the
  id String @id @default(uuid())
  unified_object_id String @unique // The base64 encoded unified file ID
-  model_object_id String @unique // the id returned by the backend API provider 
+  model_object_id String @unique // the id returned by the backend API provider
  file_object Json // Stores the OpenAIFileObject
  file_purpose String // either 'batch' or 'fine-tune'
-  status String? // check if batch cost has been tracked  
+  status String? // check if batch cost has been tracked
  batch_processed Boolean @default(false) // set to true by CheckBatchCost after cost is computed
  created_at DateTime @default(now())
  created_by String?
+  team_id String?
  updated_at DateTime @updatedAt
-  updated_by String? 
+  updated_by String?

  @@index([unified_object_id])
  @@index([model_object_id])
+  @@index([team_id, created_at(sort: Desc)])
 }

 model LiteLLM_ManagedVectorStoreTable {
@ -918,10 +922,12 @@ model LiteLLM_ManagedVectorStoreTable {
  storage_url String? // Storage URL (if applicable)
  created_at DateTime @default(now())
  created_by String?
+  team_id String?
  updated_at DateTime @updatedAt
  updated_by String?

  @@index([unified_resource_id])
+  @@index([team_id, created_at(sort: Desc)])
 }

 model LiteLLM_ManagedVectorStoresTable {
--- a/litellm-proxy-extras/pyproject.toml
+++ b/litellm-proxy-extras/pyproject.toml
@ -1,6 +1,6 @@
 [project]
 name = "litellm-proxy-extras"
-version = "0.4.70"
+version = "0.4.71"
 description = "Additional files for the LiteLLM Proxy. Reduces the size of the main litellm package."
 readme = "README.md"
 requires-python = ">=3.9"
@ -16,7 +16,7 @@ Repository = "https://github.com/BerriAI/litellm"
 Documentation = "https://docs.litellm.ai"

 [build-system]
-requires = ["uv_build==0.10.7"]
+requires = ["uv_build==0.11.8"]
 build-backend = "uv_build"

 [tool.uv]
@ -26,7 +26,7 @@ required-version = ">=0.10.9"
 module-root = ""

 [tool.commitizen]
-version = "0.4.70"
+version = "0.4.71"
 version_files = [
    "pyproject.toml:^version",
    "../pyproject.toml:litellm-proxy-extras==",
--- a/litellm/init.py
+++ b/litellm/init.py
@ -166,7 +166,7 @@ langfuse_default_tags: Optional[List[str]] = None
 langsmith_batch_size: Optional[int] = None
 prometheus_initialize_budget_metrics: Optional[bool] = False
 prometheus_latency_buckets: Optional[List[float]] = None
-require_auth_for_metrics_endpoint: Optional[bool] = False
+require_auth_for_metrics_endpoint: Optional[bool] = True
 argilla_batch_size: Optional[int] = None
 datadog_use_v1: Optional[bool] = False  # if you want to use v1 datadog logged payload.
 gcs_pub_sub_use_v1: Optional[bool] = (
@ -280,6 +280,7 @@ ssl_security_level: Optional[str] = None
 ssl_certificate: Optional[str] = None
 user_url_validation: bool = True
 user_url_allowed_hosts: List[str] = []
+provider_url_destination_allowed_hosts: List[str] = []
 ssl_ecdh_curve: Optional[str] = (
    None  # Set to 'X25519' to disable PQC and improve performance
 )
@ -288,6 +289,7 @@ disable_token_counter: bool = False
 disable_add_transform_inline_image_block: bool = False
 disable_add_user_agent_to_request_tags: bool = False
 disable_anthropic_gemini_context_caching_transform: bool = False
+disable_vertex_batch_output_transformation: bool = False
 extra_spend_tag_headers: Optional[List[str]] = None
 in_memory_llm_clients_cache: "LLMClientCache"
 safe_memory_mode: bool = False
@ -330,6 +332,9 @@ enable_model_config_credential_overrides: bool = False
 enable_key_alias_format_validation: bool = (
    False  # opt-in validation of key_alias format on /key/generate and /key/update
 )
+enable_gemini_default_thinking_level_low: bool = (
+    False  # opt-in: force thinkingLevel low/minimal for Gemini 3 thinking param mapping
+)
 ####################
 logging: bool = True
 enable_loadbalancing_on_batch_endpoints: Optional[bool] = None
--- a/litellm/_logging.py
+++ b/litellm/_logging.py
@ -1,12 +1,12 @@
 import ast
 import logging
 import os
-import re
 import sys
 from datetime import datetime
 from logging import Formatter
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, Optional

+from litellm.litellm_core_utils.secret_redaction import redact_string
 from litellm.litellm_core_utils.safe_json_dumps import safe_dumps
 from litellm.litellm_core_utils.safe_json_loads import safe_json_loads

@ -21,74 +21,11 @@ _ENABLE_SECRET_REDACTION = (
    os.getenv("LITELLM_DISABLE_REDACT_SECRETS", "").lower() != "true"
 )

-_REDACTED = "REDACTED"
-
-
-def _build_secret_patterns() -> re.Pattern:
-    patterns: List[str] = [
-        # ── PEM private key / certificate blocks ──
-        r"-----BEGIN[A-Z \-]*PRIVATE KEY-----[\s\S]*?-----END[A-Z \-]*PRIVATE KEY-----",
-        # ── GCP OAuth2 access tokens (ya29.*) ──
-        r"\bya29\.[A-Za-z0-9_.~+/-]+",
-        # ── Credential %s formatting (space separator, no key= prefix) ──
-        r"(?:client_secret|azure_password|azure_username)\s+[^\s,'\"})\]{}>]+",
-        # AWS access key IDs
-        r"(?:AKIA|ASIA)[0-9A-Z]{16}",
-        # AWS secrets / session tokens / access key IDs (key=value)
-        r"(?:aws_secret_access_key|aws_session_token|aws_access_key_id)"
-        r"\s*[:=]\s*[A-Za-z0-9/+=]{20,}",
-        # Bearer tokens (OAuth, JWT, etc.)
-        r"Bearer\s+[A-Za-z0-9\-._~+/]{10,}=*",
-        # Basic auth headers
-        r"Basic\s+[A-Za-z0-9+/]{10,}={0,2}",
-        # OpenAI / Anthropic sk- prefixed keys
-        r"sk-[A-Za-z0-9\-_]{20,}",
-        # Generic api_key / api-key / apikey (handles 'key': 'value' dict repr)
-        r"(?:api[_-]?key)['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]{8,}",
-        # x-api-key / api-key header values (handles 'key': 'value' dict repr)
-        r"(?:x-api-key|api-key)['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]+",
-        # Anthropic internal header keys
-        r"x-ak-[A-Za-z0-9\-_]{20,}",
-        # Google API keys
-        r"AIza[0-9A-Za-z\-_]{35}",
-        # Password / secret params (handles key=value and 'key': 'value')
-        # Word boundary prevents O(n^2) backtracking on long word-char runs.
-        r"(?:^|(?<=\W))\w*(?:password|passwd|client_secret|secret_key|_secret)"
-        r"['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]+",
-        # Database connection string credentials (scheme://user:pass@host)
-        r"(?<=://)[^\s'\"]*:[^\s'\"@]+(?=@)",
-        # Databricks personal access tokens
-        r"dapi[0-9a-f]{32}",
-        # ── Key-name-based redaction ──
-        # Catches secrets inside dicts/config dumps by matching on the KEY name
-        # regardless of what the value looks like.
-        # e.g. 'master_key': 'any-value-here', "database_url": "postgres://..."
-        # private_key with PEM-aware value capture
-        r"""private_key['\"]?\s*[:=]\s*['\"]?(?:-----BEGIN[A-Z \-]*PRIVATE KEY-----[\s\S]*?-----END[A-Z \-]*PRIVATE KEY-----|[^\s,'\"})\]{}>]+)""",
-        r"(?:master_key|database_url|db_url|connection_string|"
-        r"signing_key|encryption_key|"
-        r"auth_token|access_token|refresh_token|"
-        r"slack_webhook_url|webhook_url|"
-        r"database_connection_string|"
-        r"huggingface_token|jwt_secret)"
-        r"""['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]+""",
-        # ── Raw JWTs (without Bearer prefix) ──
-        r"\beyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]*",
-        # ── Azure SAS tokens in URLs ──
-        r"[?&]sig=[A-Za-z0-9%+/=]+",
-        # ── Full JSON service-account blobs (single-line and multi-line) ──
-        r'\{[^{}]*"type"\s*:\s*"service_account"[^{}]*(?:\{[^{}]*\}[^{}]*)*\}',
-    ]
-    return re.compile("|".join(patterns), re.IGNORECASE)
-
-
-_SECRET_RE = _build_secret_patterns()
-

 def _redact_string(value: str) -> str:
    if not _ENABLE_SECRET_REDACTION:
        return value
-    return _SECRET_RE.sub(_REDACTED, value)
+    return redact_string(value)


 def redact_secrets(value: str) -> str:
--- a/litellm/anthropic_beta_headers_config.json
+++ b/litellm/anthropic_beta_headers_config.json
@ -72,7 +72,7 @@
    "computer-use-2025-11-24": "computer-use-2025-11-24",
    "context-1m-2025-08-07": "context-1m-2025-08-07",
    "context-management-2025-06-27": null,
-    "effort-2025-11-24": null,
+    "effort-2025-11-24": "effort-2025-11-24",
    "fast-mode-2026-02-01": null,
    "files-api-2025-04-14": null,
    "fine-grained-tool-streaming-2025-05-14": null,
@ -103,7 +103,7 @@
    "computer-use-2025-11-24": "computer-use-2025-11-24",
    "context-1m-2025-08-07": "context-1m-2025-08-07",
    "context-management-2025-06-27": null,
-    "effort-2025-11-24": null,
+    "effort-2025-11-24": "effort-2025-11-24",
    "fast-mode-2026-02-01": null,
    "files-api-2025-04-14": null,
    "fine-grained-tool-streaming-2025-05-14": null,
--- a/litellm/batches/batch_utils.py
+++ b/litellm/batches/batch_utils.py
@ -387,6 +387,27 @@ def _get_batch_job_total_usage_from_file_content(
    )


+def _get_models_from_batch_input_file_content(
+    file_content_dictionary: List[dict],
+) -> List[str]:
+    """Extract the distinct ``body.model`` values from a batch *input* file.
+
+    Used by the proxy's batch pre-call hook to enforce that the caller is
+    authorized for every model named inside the JSONL — not just the one
+    on the outer request — so the proxy's per-key model allowlist isn't
+    bypassed by smuggling expensive models into the batch file.
+    """
+    models: List[str] = []
+    seen: set = set()
+    for _item in file_content_dictionary:
+        body = _item.get("body") or {}
+        model = body.get("model")
+        if model and model not in seen:
+            seen.add(model)
+            models.append(model)
+    return models
+
+
 def _get_batch_job_input_file_usage(
    file_content_dictionary: List[dict],
    custom_llm_provider: Literal["openai", "azure", "vertex_ai"] = "openai",
@ -403,11 +424,25 @@ def _get_batch_job_input_file_usage(
    for _item in file_content_dictionary:
        body = _item.get("body", {})
        model = body.get("model", model_name or "")
-        messages = body.get("messages", [])

+        # Chat completion payloads.
+        messages = body.get("messages")
        if messages:
-            item_tokens = token_counter(model=model, messages=messages)
-            prompt_tokens += item_tokens
+            prompt_tokens += token_counter(model=model, messages=messages)
+            continue
+
+        # Text completion payloads (`prompt`).
+        prompt = body.get("prompt")
+        if prompt:
+            prompt_tokens += _count_prompt_or_input_tokens(model=model, value=prompt)
+            continue
+
+        # Embedding payloads (`input`).
+        input_data = body.get("input")
+        if input_data:
+            prompt_tokens += _count_prompt_or_input_tokens(
+                model=model, value=input_data
+            )

    return Usage(
        total_tokens=prompt_tokens + completion_tokens,
@ -416,6 +451,43 @@ def _get_batch_job_input_file_usage(
    )


+def _count_prompt_or_input_tokens(model: str, value: Any) -> int:
+    """Token-count a ``prompt`` / ``input`` field that the OpenAI batch
+    schema allows in four shapes:
+
+    - ``str``: a single text prompt.
+    - ``list[str]``: multiple text prompts.
+    - ``list[int]``: a pre-tokenized prompt (each int counts as 1 token).
+    - ``list[list[int]]``: multiple pre-tokenized prompts.
+
+    Pre-fix only the string shapes were counted, so a caller could send
+    a large ``list[list[int]]`` payload and slip past TPM rate limits
+    with a recorded cost of zero tokens.
+    """
+    if isinstance(value, str):
+        return token_counter(model=model, text=value)
+    if isinstance(value, list):
+        total = 0
+        for chunk in value:
+            if isinstance(chunk, str):
+                total += token_counter(model=model, text=chunk)
+            elif isinstance(chunk, int):
+                # Single pre-tokenized prompt at the top level: each
+                # int counts as one token.
+                total += 1
+            elif isinstance(chunk, list):
+                # Nested pre-tokenized prompt: every int contributes a
+                # token. Mixed string/int items still count.
+                total += sum(1 if isinstance(t, int) else 0 for t in chunk)
+                total += sum(
+                    token_counter(model=model, text=t)
+                    for t in chunk
+                    if isinstance(t, str)
+                )
+        return total
+    return 0
+
+
 def _get_batch_job_usage_from_response_body(response_body: dict) -> Usage:
    """
    Get the tokens of a batch job from the response body
--- a/litellm/caching/caching.py
+++ b/litellm/caching/caching.py
@ -432,9 +432,10 @@ class Cache:
            str: The final hashed cache key with the redis namespace.
        """
        dynamic_cache_control: DynamicCacheControl = kwargs.get("cache", {})
+        metadata = kwargs.get("metadata") or {}
        namespace = (
            dynamic_cache_control.get("namespace")
-            or kwargs.get("metadata", {}).get("redis_namespace")
+            or metadata.get("redis_namespace")
            or self.namespace
        )
        if namespace:
--- a/litellm/caching/caching_handler.py
+++ b/litellm/caching/caching_handler.py
@ -87,6 +87,18 @@ class CachingHandlerResponse(BaseModel):
 in_memory_cache_obj = InMemoryCache()


+def _should_defer_streaming_cache_hit_callbacks(*, kwargs: Dict[str, Any]) -> bool:
+    """
+    When stream=True, do not run success callbacks at cache-hit time.
+
+    Cached chat/text completion replay uses CustomStreamWrapper; cached Responses
+    replay uses CachedResponsesAPIStreamingIterator. Both invoke logging success
+    handlers when the stream finishes; firing them here too would double-count
+    spend and callback records.
+    """
+    return kwargs.get("stream", False) is True
+
+
 class LLMCachingHandler:
    def __init__(
        self,
@ -99,6 +111,7 @@ class LLMCachingHandler:
        self.async_streaming_chunks: List[ModelResponse] = []
        self.sync_streaming_chunks: List[ModelResponse] = []
        self.request_kwargs = request_kwargs
+        self.preset_cache_key: Optional[str] = None
        self.original_function = original_function
        self.start_time = start_time
        if litellm.cache is not None and isinstance(litellm.cache.cache, RedisCache):
@ -206,7 +219,7 @@ class LLMCachingHandler:
                        custom_llm_provider=kwargs.get("custom_llm_provider", None),
                        args=args,
                    )
-                    if kwargs.get("stream", False) is False:
+                    if not _should_defer_streaming_cache_hit_callbacks(kwargs=kwargs):
                        # LOG SUCCESS
                        self._async_log_cache_hit_on_callbacks(
                            logging_obj=logging_obj,
@ -215,11 +228,12 @@ class LLMCachingHandler:
                            end_time=end_time,
                            cache_hit=cache_hit,
                        )
-                    cache_key = litellm.cache.get_cache_key(**kwargs)
-                    if (
-                        isinstance(cached_result, BaseModel)
-                        or isinstance(cached_result, CustomStreamWrapper)
-                    ) and hasattr(cached_result, "_hidden_params"):
+                    cache_key = (
+                        self.preset_cache_key
+                        or self.request_kwargs.get("cache_key")
+                        or litellm.cache.get_cache_key(**self.request_kwargs)
+                    )
+                    if hasattr(cached_result, "_hidden_params"):
                        cached_result._hidden_params["cache_key"] = cache_key  # type: ignore
                    return CachingHandlerResponse(cached_result=cached_result)
                elif (
@ -265,8 +279,6 @@ class LLMCachingHandler:
        kwargs: Dict[str, Any],
        args: Optional[Tuple[Any, ...]] = None,
    ) -> CachingHandlerResponse:
-        from litellm.utils import CustomStreamWrapper
-
        cached_result: Optional[Any] = None

        # Check if caching should be performed BEFORE doing expensive kwargs copy
@ -282,6 +294,11 @@ class LLMCachingHandler:
                    args,
                )
            )
+            if new_kwargs.get("metadata") is None:
+                new_kwargs.pop("metadata", None)
+            if new_kwargs.get("stream") is True and "cache_key" not in new_kwargs:
+                new_kwargs["cache_key"] = litellm.cache.get_cache_key(**new_kwargs)
+            self.request_kwargs = new_kwargs
            print_verbose("Checking Sync Cache")
            cached_result = litellm.cache.get_cache(**new_kwargs)
            if cached_result is not None:
@ -322,17 +339,19 @@ class LLMCachingHandler:
                        is_async=False,
                    )

-                    logging_obj.handle_sync_success_callbacks_for_async_calls(
-                        result=cached_result,
-                        start_time=start_time,
-                        end_time=end_time,
-                        cache_hit=cache_hit,
+                    if not _should_defer_streaming_cache_hit_callbacks(kwargs=kwargs):
+                        logging_obj.handle_sync_success_callbacks_for_async_calls(
+                            result=cached_result,
+                            start_time=start_time,
+                            end_time=end_time,
+                            cache_hit=cache_hit,
+                        )
+                    cache_key = (
+                        self.preset_cache_key
+                        or self.request_kwargs.get("cache_key")
+                        or litellm.cache.get_cache_key(**self.request_kwargs)
                    )
-                    cache_key = litellm.cache.get_cache_key(**kwargs)
-                    if (
-                        isinstance(cached_result, BaseModel)
-                        or isinstance(cached_result, CustomStreamWrapper)
-                    ) and hasattr(cached_result, "_hidden_params"):
+                    if hasattr(cached_result, "_hidden_params"):
                        cached_result._hidden_params["cache_key"] = cache_key  # type: ignore
                    return CachingHandlerResponse(cached_result=cached_result)
        return CachingHandlerResponse(cached_result=cached_result)
@ -686,6 +705,11 @@ class LLMCachingHandler:
                args,
            )
        )
+        if new_kwargs.get("metadata") is None:
+            new_kwargs.pop("metadata", None)
+        if new_kwargs.get("stream") is True and "cache_key" not in new_kwargs:
+            new_kwargs["cache_key"] = litellm.cache.get_cache_key(**new_kwargs)
+        self.request_kwargs = new_kwargs
        cached_result: Optional[Any] = None
        if call_type == CallTypes.aembedding.value:
            if isinstance(new_kwargs["input"], str):
@ -710,14 +734,26 @@ class LLMCachingHandler:
                if all(result is None for result in cached_result):
                    cached_result = None
        else:
+            request_kwargs = new_kwargs.copy()
+            request_cache_key = request_kwargs.pop("cache_key", None)
            if litellm.cache._supports_async() is True:
                ## check if dual cache is supported ##
+                self.preset_cache_key = (
+                    request_cache_key or litellm.cache.get_cache_key(**request_kwargs)
+                )
                cached_result = await litellm.cache.async_get_cache(
-                    dynamic_cache_object=self.dual_cache, **new_kwargs
+                    dynamic_cache_object=self.dual_cache,
+                    cache_key=self.preset_cache_key,
+                    **request_kwargs,
                )
            else:  # fallback for caches that don't support async
+                self.preset_cache_key = (
+                    request_cache_key or litellm.cache.get_cache_key(**request_kwargs)
+                )
                cached_result = litellm.cache.get_cache(
-                    dynamic_cache_object=self.dual_cache, **new_kwargs
+                    dynamic_cache_object=self.dual_cache,
+                    cache_key=self.preset_cache_key,
+                    **request_kwargs,
                )
        return cached_result

@ -825,8 +861,27 @@ class LLMCachingHandler:
        elif (call_type == "aresponses" or call_type == "responses") and isinstance(
            cached_result, dict
        ):
-            # Convert cached dict back to ResponsesAPIResponse object
-            cached_result = ResponsesAPIResponse(**cached_result)
+            from litellm.responses.streaming_iterator import (
+                CachedResponsesAPIStreamingIterator,
+            )
+
+            response_obj = ResponsesAPIResponse(**cached_result)
+            if (
+                hasattr(response_obj, "_hidden_params")
+                and response_obj._hidden_params is not None
+                and isinstance(response_obj._hidden_params, dict)
+            ):
+                response_obj._hidden_params["cache_hit"] = True
+
+            if kwargs.get("stream", False) is True:
+                cached_result = CachedResponsesAPIStreamingIterator(
+                    response=response_obj,
+                    logging_obj=logging_obj,
+                    request_data=kwargs,
+                    call_type=call_type,
+                )
+            else:
+                cached_result = response_obj

        if (
            hasattr(cached_result, "_hidden_params")
--- a/litellm/caching/dual_cache.py
+++ b/litellm/caching/dual_cache.py
@ -92,6 +92,25 @@ class DualCache(BaseCache):
        if default_redis_ttl is not None:
            self.default_redis_ttl = default_redis_ttl

+    def attach_redis_cache(
+        self,
+        redis_cache: Optional[RedisCache] = None,
+        *,
+        default_redis_ttl: Optional[float] = None,
+    ) -> None:
+        """
+        Attach a Redis backend if this DualCache does not already have one.
+
+        No-op when ``redis_cache`` is None or when Redis was already set (constructor
+        or a prior attach). Use this for lazy wiring after a shared Redis client exists.
+        Does not backfill in-memory-only keys to Redis.
+        """
+        if redis_cache is None or self.redis_cache is not None:
+            return
+        self.redis_cache = redis_cache
+        if default_redis_ttl is not None:
+            self.default_redis_ttl = default_redis_ttl
+
    def set_cache(self, key, value, local_only: bool = False, **kwargs):
        # Update both Redis and in-memory cache
        try:
--- a/litellm/caching/qdrant_semantic_cache.py
+++ b/litellm/caching/qdrant_semantic_cache.py
@ -11,17 +11,23 @@ Has 4 methods:
 import ast
 import asyncio
 import json
-from typing import Any, cast
+import os
+from typing import Any, Dict, cast

 import litellm
 from litellm._logging import print_verbose
 from litellm.constants import QDRANT_SCALAR_QUANTILE, QDRANT_VECTOR_SIZE
+from litellm.litellm_core_utils.prompt_templates.common_utils import (
+    get_str_from_messages,
+)
 from litellm.types.utils import EmbeddingResponse

 from .base_cache import BaseCache


 class QdrantSemanticCache(BaseCache):
+    CACHE_KEY_FIELD_NAME = "litellm_cache_key"
+
    def __init__(  # noqa: PLR0915
        self,
        qdrant_api_base=None,
@ -33,8 +39,6 @@ class QdrantSemanticCache(BaseCache):
        host_type=None,
        vector_size=None,
    ):
-        import os
-
        from litellm.llms.custom_httpx.http_handler import (
            _get_httpx_client,
            get_async_httpx_client,
@ -115,7 +119,9 @@ class QdrantSemanticCache(BaseCache):
            print_verbose(
                f"Collection already exists.\nCollection details:{self.collection_info}"
            )
+            self._ensure_cache_key_payload_index()
        else:
+            quantization_params: Dict[str, Any]
            if quantization_config is None or quantization_config == "binary":
                quantization_params = {
                    "binary": {
@ -156,6 +162,7 @@ class QdrantSemanticCache(BaseCache):
                print_verbose(
                    f"New collection created.\nCollection details:{self.collection_info}"
                )
+                self._ensure_cache_key_payload_index()
            else:
                raise Exception("Error while creating new collection")

@ -170,15 +177,94 @@ class QdrantSemanticCache(BaseCache):
            cached_response = ast.literal_eval(cached_response)
        return cached_response

+    def _get_qdrant_cache_key_filter(self, key: str) -> dict:
+        return {
+            "must": [
+                {
+                    "key": self.CACHE_KEY_FIELD_NAME,
+                    "match": {"value": str(key)},
+                }
+            ]
+        }
+
+    def _add_cache_key_filter_to_search_data(self, data: dict, key: str) -> None:
+        data["filter"] = self._get_qdrant_cache_key_filter(key)
+
+    def _ensure_cache_key_payload_index(self) -> None:
+        try:
+            response = self.sync_client.put(
+                url=f"{self.qdrant_api_base}/collections/{self.collection_name}/index",
+                headers=self.headers,
+                json={
+                    "field_name": self.CACHE_KEY_FIELD_NAME,
+                    "field_schema": "keyword",
+                },
+            )
+            if response.status_code not in (200, 201):
+                print_verbose(
+                    "Qdrant semantic-cache could not create cache-key payload index: "
+                    f"{response.text}"
+                )
+        except Exception as exc:
+            print_verbose(
+                "Qdrant semantic-cache could not create cache-key payload index: "
+                f"{str(exc)}"
+            )
+
+    def _payload_matches_cache_key(self, payload: dict, key: str) -> bool:
+        # Pre-isolation points stored only prompt + response with no cache-key
+        # payload field. Reassigning them to a caller's key would risk
+        # cross-scope hits, so they're treated as misses and re-populated on
+        # the next set_cache.
+        cached_key = payload.get(self.CACHE_KEY_FIELD_NAME)
+        return cached_key is not None and str(cached_key) == str(key)
+
+    async def _get_async_embedding(self, prompt: str, **kwargs) -> Any:
+        llm_model_list = None
+        llm_router = None
+
+        try:
+            from litellm.proxy.proxy_server import (
+                llm_model_list as proxy_llm_model_list,
+                llm_router as proxy_llm_router,
+            )
+
+            llm_model_list = proxy_llm_model_list
+            llm_router = proxy_llm_router
+        except ImportError:
+            pass
+
+        router_model_names = (
+            [m["model_name"] for m in llm_model_list]
+            if llm_model_list is not None
+            else []
+        )
+        if llm_router is not None and self.embedding_model in router_model_names:
+            user_api_key = kwargs.get("metadata", {}).get("user_api_key", "")
+            return await llm_router.aembedding(
+                model=self.embedding_model,
+                input=prompt,
+                cache={"no-store": True, "no-cache": True},
+                metadata={
+                    "user_api_key": user_api_key,
+                    "semantic-cache-embedding": True,
+                    "trace_id": kwargs.get("metadata", {}).get("trace_id", None),
+                },
+            )
+
+        return await litellm.aembedding(
+            model=self.embedding_model,
+            input=prompt,
+            cache={"no-store": True, "no-cache": True},
+        )
+
    def set_cache(self, key, value, **kwargs):
        print_verbose(f"qdrant semantic-cache set_cache, kwargs: {kwargs}")
        from litellm._uuid import uuid

        # get the prompt
        messages = kwargs["messages"]
-        prompt = ""
-        for message in messages:
-            prompt += message["content"]
+        prompt = get_str_from_messages(messages)

        # create an embedding for prompt
        embedding_response = cast(
@ -202,6 +288,7 @@ class QdrantSemanticCache(BaseCache):
                    "id": str(uuid.uuid4()),
                    "vector": embedding,
                    "payload": {
+                        self.CACHE_KEY_FIELD_NAME: str(key),
                        "text": prompt,
                        "response": value,
                    },
@ -220,9 +307,7 @@ class QdrantSemanticCache(BaseCache):

        # get the messages
        messages = kwargs["messages"]
-        prompt = ""
-        for message in messages:
-            prompt += message["content"]
+        prompt = get_str_from_messages(messages)

        # convert to embedding
        embedding_response = cast(
@ -249,6 +334,7 @@ class QdrantSemanticCache(BaseCache):
            "limit": 1,
            "with_payload": True,
        }
+        self._add_cache_key_filter_to_search_data(data=data, key=key)

        search_response = self.sync_client.post(
            url=f"{self.qdrant_api_base}/collections/{self.collection_name}/points/search",
@ -258,21 +344,33 @@ class QdrantSemanticCache(BaseCache):
        results = search_response.json()["result"]

        if results is None:
+            kwargs.setdefault("metadata", {})["semantic-similarity"] = 0.0
            return None
        if isinstance(results, list):
            if len(results) == 0:
+                kwargs.setdefault("metadata", {})["semantic-similarity"] = 0.0
                return None

        similarity = results[0]["score"]
-        cached_prompt = results[0]["payload"]["text"]
+        payload = results[0]["payload"]
+        if not self._payload_matches_cache_key(payload=payload, key=key):
+            print_verbose("Qdrant semantic-cache hit did not match cache key scope")
+            kwargs.setdefault("metadata", {})["semantic-similarity"] = 0.0
+            return None
+
+        cached_prompt = payload["text"]

        # check similarity, if more than self.similarity_threshold, return results
        print_verbose(
            f"semantic cache: similarity threshold: {self.similarity_threshold}, similarity: {similarity}, prompt: {prompt}, closest_cached_prompt: {cached_prompt}"
        )
+
+        # update kwargs["metadata"] with similarity, don't rewrite the original metadata
+        kwargs.setdefault("metadata", {})["semantic-similarity"] = similarity
+
        if similarity >= self.similarity_threshold:
            # cache hit !
-            cached_value = results[0]["payload"]["response"]
+            cached_value = payload["response"]
            print_verbose(
                f"got a cache hit, similarity: {similarity}, Current prompt: {prompt}, cached_prompt: {cached_prompt}"
            )
@ -285,40 +383,12 @@ class QdrantSemanticCache(BaseCache):
    async def async_set_cache(self, key, value, **kwargs):
        from litellm._uuid import uuid

-        from litellm.proxy.proxy_server import llm_model_list, llm_router
-
        print_verbose(f"async qdrant semantic-cache set_cache, kwargs: {kwargs}")

        # get the prompt
        messages = kwargs["messages"]
-        prompt = ""
-        for message in messages:
-            prompt += message["content"]
-        # create an embedding for prompt
-        router_model_names = (
-            [m["model_name"] for m in llm_model_list]
-            if llm_model_list is not None
-            else []
-        )
-        if llm_router is not None and self.embedding_model in router_model_names:
-            user_api_key = kwargs.get("metadata", {}).get("user_api_key", "")
-            embedding_response = await llm_router.aembedding(
-                model=self.embedding_model,
-                input=prompt,
-                cache={"no-store": True, "no-cache": True},
-                metadata={
-                    "user_api_key": user_api_key,
-                    "semantic-cache-embedding": True,
-                    "trace_id": kwargs.get("metadata", {}).get("trace_id", None),
-                },
-            )
-        else:
-            # convert to embedding
-            embedding_response = await litellm.aembedding(
-                model=self.embedding_model,
-                input=prompt,
-                cache={"no-store": True, "no-cache": True},
-            )
+        prompt = get_str_from_messages(messages)
+        embedding_response = await self._get_async_embedding(prompt, **kwargs)

        # get the embedding
        embedding = embedding_response["data"][0]["embedding"]
@ -332,6 +402,7 @@ class QdrantSemanticCache(BaseCache):
                    "id": str(uuid.uuid4()),
                    "vector": embedding,
                    "payload": {
+                        self.CACHE_KEY_FIELD_NAME: str(key),
                        "text": prompt,
                        "response": value,
                    },
@ -348,38 +419,12 @@ class QdrantSemanticCache(BaseCache):

    async def async_get_cache(self, key, **kwargs):
        print_verbose(f"async qdrant semantic-cache get_cache, kwargs: {kwargs}")
-        from litellm.proxy.proxy_server import llm_model_list, llm_router

        # get the messages
        messages = kwargs["messages"]
-        prompt = ""
-        for message in messages:
-            prompt += message["content"]
+        prompt = get_str_from_messages(messages)

-        router_model_names = (
-            [m["model_name"] for m in llm_model_list]
-            if llm_model_list is not None
-            else []
-        )
-        if llm_router is not None and self.embedding_model in router_model_names:
-            user_api_key = kwargs.get("metadata", {}).get("user_api_key", "")
-            embedding_response = await llm_router.aembedding(
-                model=self.embedding_model,
-                input=prompt,
-                cache={"no-store": True, "no-cache": True},
-                metadata={
-                    "user_api_key": user_api_key,
-                    "semantic-cache-embedding": True,
-                    "trace_id": kwargs.get("metadata", {}).get("trace_id", None),
-                },
-            )
-        else:
-            # convert to embedding
-            embedding_response = await litellm.aembedding(
-                model=self.embedding_model,
-                input=prompt,
-                cache={"no-store": True, "no-cache": True},
-            )
+        embedding_response = await self._get_async_embedding(prompt, **kwargs)

        # get the embedding
        embedding = embedding_response["data"][0]["embedding"]
@ -396,6 +441,7 @@ class QdrantSemanticCache(BaseCache):
            "limit": 1,
            "with_payload": True,
        }
+        self._add_cache_key_filter_to_search_data(data=data, key=key)

        search_response = await self.async_client.post(
            url=f"{self.qdrant_api_base}/collections/{self.collection_name}/points/search",
@ -414,7 +460,13 @@ class QdrantSemanticCache(BaseCache):
                return None

        similarity = results[0]["score"]
-        cached_prompt = results[0]["payload"]["text"]
+        payload = results[0]["payload"]
+        if not self._payload_matches_cache_key(payload=payload, key=key):
+            print_verbose("Qdrant semantic-cache hit did not match cache key scope")
+            kwargs.setdefault("metadata", {})["semantic-similarity"] = 0.0
+            return None
+
+        cached_prompt = payload["text"]

        # check similarity, if more than self.similarity_threshold, return results
        print_verbose(
@ -426,7 +478,7 @@ class QdrantSemanticCache(BaseCache):

        if similarity >= self.similarity_threshold:
            # cache hit !
-            cached_value = results[0]["payload"]["response"]
+            cached_value = payload["response"]
            print_verbose(
                f"got a cache hit, similarity: {similarity}, Current prompt: {prompt}, cached_prompt: {cached_prompt}"
            )
--- a/litellm/caching/redis_cache.py
+++ b/litellm/caching/redis_cache.py
@ -551,6 +551,13 @@ class RedisCache(BaseCache):
    async def async_set_cache(self, key, value, **kwargs):
        from redis.asyncio import Redis

+        if key is None:
+            verbose_logger.debug(
+                "LiteLLM Redis Caching: async set() skipped — key is None, value=%r",
+                value,
+            )
+            return None
+
        start_time = time.time()
        try:
            _redis_client: Redis = self.init_async_client()  # type: ignore
@ -569,8 +576,9 @@ class RedisCache(BaseCache):
                )
            )
            verbose_logger.error(
-                "LiteLLM Redis Caching: async set() - Got exception from REDIS %s, Writing value=%s",
+                "LiteLLM Redis Caching: async set() - Got exception from REDIS %s, key=%r, value=%r",
                str(e),
+                key,
                value,
            )
            raise e
--- a/litellm/caching/redis_semantic_cache.py
+++ b/litellm/caching/redis_semantic_cache.py
@ -35,6 +35,7 @@ class RedisSemanticCache(BaseCache):
    """

    DEFAULT_REDIS_INDEX_NAME: str = "litellm_semantic_cache_index"
+    CACHE_KEY_FIELD_NAME: str = "litellm_cache_key"

    def __init__(
        self,
@ -66,8 +67,8 @@ class RedisSemanticCache(BaseCache):
            Exception: If similarity_threshold is not provided or required Redis
                connection information is missing
        """
-        from redisvl.extensions.llmcache import SemanticCache
-        from redisvl.utils.vectorize import CustomTextVectorizer
+        from redisvl.extensions.llmcache import SemanticCache  # type: ignore[import-not-found, import-untyped]
+        from redisvl.utils.vectorize import CustomTextVectorizer  # type: ignore[import-not-found, import-untyped]

        if index_name is None:
            index_name = self.DEFAULT_REDIS_INDEX_NAME
@ -109,14 +110,94 @@ class RedisSemanticCache(BaseCache):
        # Initialize the Redis vectorizer and cache
        cache_vectorizer = CustomTextVectorizer(self._get_embedding)

-        self.llmcache = SemanticCache(
-            name=index_name,
+        self.llmcache = self._init_semantic_cache(
+            semantic_cache_cls=SemanticCache,
+            index_name=index_name,
            redis_url=redis_url,
-            vectorizer=cache_vectorizer,
-            distance_threshold=self.distance_threshold,
-            overwrite=False,
+            cache_vectorizer=cache_vectorizer,
        )

+    @classmethod
+    def _cache_key_filterable_field(cls) -> Dict[str, str]:
+        return {
+            "name": cls.CACHE_KEY_FIELD_NAME,
+            "type": "tag",
+        }
+
+    def _init_semantic_cache(
+        self,
+        semantic_cache_cls: Any,
+        index_name: str,
+        redis_url: str,
+        cache_vectorizer: Any,
+    ) -> Any:
+        def _is_schema_mismatch(exc: ValueError) -> bool:
+            error_message = str(exc).lower()
+            return any(
+                phrase in error_message
+                for phrase in ("schema does not match", "index schema")
+            )
+
+        try:
+            return semantic_cache_cls(
+                name=index_name,
+                redis_url=redis_url,
+                vectorizer=cache_vectorizer,
+                distance_threshold=self.distance_threshold,
+                filterable_fields=[self._cache_key_filterable_field()],
+                overwrite=False,
+            )
+        except ValueError as exc:
+            if not _is_schema_mismatch(exc):
+                raise
+
+            isolated_index_name = f"{index_name}_isolated"
+            print_verbose(
+                "Redis semantic-cache existing index schema is not isolated; "
+                f"using isolated index - {isolated_index_name}"
+            )
+            try:
+                return semantic_cache_cls(
+                    name=isolated_index_name,
+                    redis_url=redis_url,
+                    vectorizer=cache_vectorizer,
+                    distance_threshold=self.distance_threshold,
+                    filterable_fields=[self._cache_key_filterable_field()],
+                    overwrite=False,
+                )
+            except ValueError as isolated_exc:
+                if not _is_schema_mismatch(isolated_exc):
+                    raise
+
+                print_verbose(
+                    "Redis semantic-cache isolated index schema is stale; "
+                    f"recreating isolated index - {isolated_index_name}"
+                )
+                return semantic_cache_cls(
+                    name=isolated_index_name,
+                    redis_url=redis_url,
+                    vectorizer=cache_vectorizer,
+                    distance_threshold=self.distance_threshold,
+                    filterable_fields=[self._cache_key_filterable_field()],
+                    overwrite=True,
+                )
+
+    def _get_cache_filters(self, key: str) -> Dict[str, str]:
+        return {self.CACHE_KEY_FIELD_NAME: str(key)}
+
+    def _get_cache_key_filter_expression(self, key: str) -> Any:
+        from redisvl.query.filter import Tag  # type: ignore[import-not-found, import-untyped]
+
+        return Tag(self.CACHE_KEY_FIELD_NAME) == str(key)
+
+    def _cache_hit_matches_key(self, cache_hit: Dict[str, Any], key: str) -> bool:
+        # Pre-isolation entries with no ``litellm_cache_key`` field cannot be
+        # safely reassigned to a caller's scope and are treated as misses.
+        cached_key = cache_hit.get(self.CACHE_KEY_FIELD_NAME)
+        if isinstance(cached_key, bytes):
+            cached_key = cached_key.decode("utf-8")
+        return cached_key is not None and str(cached_key) == str(key)
+
    def _get_ttl(self, **kwargs) -> Optional[int]:
        """
        Get the TTL (time-to-live) value for cache entries.
@ -188,7 +269,7 @@ class RedisSemanticCache(BaseCache):
        Store a value in the semantic cache.

        Args:
-            key: The cache key (not directly used in semantic caching)
+            key: The cache key used to isolate semantic cache entries
            value: The response value to cache
            **kwargs: Additional arguments including 'messages' for the prompt
                and optional 'ttl' for time-to-live
@ -206,12 +287,15 @@ class RedisSemanticCache(BaseCache):
            prompt = get_str_from_messages(messages)
            value_str = str(value)

+            store_kwargs: Dict[str, Any] = {
+                "filters": self._get_cache_filters(key),
+            }
+
            # Get TTL and store in Redis semantic cache
            ttl = self._get_ttl(**kwargs)
            if ttl is not None:
-                self.llmcache.store(prompt, value_str, ttl=int(ttl))
-            else:
-                self.llmcache.store(prompt, value_str)
+                store_kwargs["ttl"] = int(ttl)
+            self.llmcache.store(prompt, value_str, **store_kwargs)
        except Exception as e:
            print_verbose(
                f"Error setting {value_str or value} in the Redis semantic cache: {str(e)}"
@ -222,7 +306,7 @@ class RedisSemanticCache(BaseCache):
        Retrieve a semantically similar cached response.

        Args:
-            key: The cache key (not directly used in semantic caching)
+            key: The cache key used to isolate semantic cache entries
            **kwargs: Additional arguments including 'messages' for the prompt

        Returns:
@ -235,18 +319,29 @@ class RedisSemanticCache(BaseCache):
            messages = kwargs.get("messages", [])
            if not messages:
                print_verbose("No messages provided for semantic cache lookup")
+                kwargs.setdefault("metadata", {})["semantic-similarity"] = 0.0
                return None

            prompt = get_str_from_messages(messages)
-            # Check the cache for semantically similar prompts
-            results = self.llmcache.check(prompt=prompt)
+            # Check the cache for semantically similar prompts in this exact
+            # LiteLLM cache-key scope.
+            check_kwargs: Dict[str, Any] = {
+                "prompt": prompt,
+                "filter_expression": self._get_cache_key_filter_expression(key),
+            }
+            results = self.llmcache.check(**check_kwargs)

            # Return None if no similar prompts found
            if not results:
+                kwargs.setdefault("metadata", {})["semantic-similarity"] = 0.0
                return None

            # Process the best matching result
            cache_hit = results[0]
+            if not self._cache_hit_matches_key(cache_hit=cache_hit, key=key):
+                print_verbose("Redis semantic-cache hit did not match cache key scope")
+                kwargs.setdefault("metadata", {})["semantic-similarity"] = 0.0
+                return None
            vector_distance = float(cache_hit["vector_distance"])

            # Convert vector distance back to similarity score
@ -257,6 +352,9 @@ class RedisSemanticCache(BaseCache):
            cached_prompt = cache_hit["prompt"]
            cached_response = cache_hit["response"]

+            # update kwargs["metadata"] with similarity, don't rewrite the original metadata
+            kwargs.setdefault("metadata", {})["semantic-similarity"] = similarity
+
            print_verbose(
                f"Cache hit: similarity threshold: {self.similarity_threshold}, "
                f"actual similarity: {similarity}, "
@ -267,6 +365,7 @@ class RedisSemanticCache(BaseCache):
            return self._get_cache_logic(cached_response=cached_response)
        except Exception as e:
            print_verbose(f"Error retrieving from Redis semantic cache: {str(e)}")
+            kwargs.setdefault("metadata", {})["semantic-similarity"] = 0.0

    async def _get_async_embedding(self, prompt: str, **kwargs) -> List[float]:
        """
@ -321,7 +420,7 @@ class RedisSemanticCache(BaseCache):
        Asynchronously store a value in the semantic cache.

        Args:
-            key: The cache key (not directly used in semantic caching)
+            key: The cache key used to isolate semantic cache entries
            value: The response value to cache
            **kwargs: Additional arguments including 'messages' for the prompt
                and optional 'ttl' for time-to-live
@ -341,21 +440,20 @@ class RedisSemanticCache(BaseCache):
            # Generate embedding for the value (response) to cache
            prompt_embedding = await self._get_async_embedding(prompt, **kwargs)

+            store_kwargs: Dict[str, Any] = {
+                "vector": prompt_embedding,
+                "filters": self._get_cache_filters(key),
+            }
+
            # Get TTL and store in Redis semantic cache
            ttl = self._get_ttl(**kwargs)
            if ttl is not None:
-                await self.llmcache.astore(
-                    prompt,
-                    value_str,
-                    vector=prompt_embedding,  # Pass through custom embedding
-                    ttl=ttl,
-                )
-            else:
-                await self.llmcache.astore(
-                    prompt,
-                    value_str,
-                    vector=prompt_embedding,  # Pass through custom embedding
-                )
+                store_kwargs["ttl"] = ttl
+            await self.llmcache.astore(
+                prompt,
+                value_str,
+                **store_kwargs,
+            )
        except Exception as e:
            print_verbose(f"Error in async_set_cache: {str(e)}")

@ -364,7 +462,7 @@ class RedisSemanticCache(BaseCache):
        Asynchronously retrieve a semantically similar cached response.

        Args:
-            key: The cache key (not directly used in semantic caching)
+            key: The cache key used to isolate semantic cache entries
            **kwargs: Additional arguments including 'messages' for the prompt

        Returns:
@ -385,17 +483,25 @@ class RedisSemanticCache(BaseCache):
            # Generate embedding for the prompt
            prompt_embedding = await self._get_async_embedding(prompt, **kwargs)

-            # Check the cache for semantically similar prompts
-            results = await self.llmcache.acheck(prompt=prompt, vector=prompt_embedding)
+            # Check the cache for semantically similar prompts in this exact
+            # LiteLLM cache-key scope.
+            check_kwargs: Dict[str, Any] = {
+                "prompt": prompt,
+                "vector": prompt_embedding,
+                "filter_expression": self._get_cache_key_filter_expression(key),
+            }
+            results = await self.llmcache.acheck(**check_kwargs)

            # handle results / cache hit
            if not results:
-                kwargs.setdefault("metadata", {})[
-                    "semantic-similarity"
-                ] = 0.0  # TODO why here but not above??
+                kwargs.setdefault("metadata", {})["semantic-similarity"] = 0.0
                return None

            cache_hit = results[0]
+            if not self._cache_hit_matches_key(cache_hit=cache_hit, key=key):
+                print_verbose("Redis semantic-cache hit did not match cache key scope")
+                kwargs.setdefault("metadata", {})["semantic-similarity"] = 0.0
+                return None
            vector_distance = float(cache_hit["vector_distance"])

            # Convert vector distance back to similarity
--- a/litellm/constants.py
+++ b/litellm/constants.py
@ -202,6 +202,12 @@ DEFAULT_REASONING_EFFORT_MEDIUM_THINKING_BUDGET = int(
 DEFAULT_REASONING_EFFORT_HIGH_THINKING_BUDGET = int(
    os.getenv("DEFAULT_REASONING_EFFORT_HIGH_THINKING_BUDGET", 4096)
 )
+DEFAULT_REASONING_EFFORT_XHIGH_THINKING_BUDGET = int(
+    os.getenv("DEFAULT_REASONING_EFFORT_XHIGH_THINKING_BUDGET", 8192)
+)
+DEFAULT_REASONING_EFFORT_MAX_THINKING_BUDGET = int(
+    os.getenv("DEFAULT_REASONING_EFFORT_MAX_THINKING_BUDGET", 16384)
+)
 MAX_TOKEN_TRIMMING_ATTEMPTS = int(
    os.getenv("MAX_TOKEN_TRIMMING_ATTEMPTS", 10)
 )  # Maximum number of attempts to trim the message
@ -399,6 +405,8 @@ BEDROCK_MAX_POLICY_SIZE = int(os.getenv("BEDROCK_MAX_POLICY_SIZE", 75))
 BEDROCK_MIN_THINKING_BUDGET_TOKENS = int(
    os.getenv("BEDROCK_MIN_THINKING_BUDGET_TOKENS", 1024)
 )
+# Anthropic's Messages API rejects thinking.budget_tokens < 1024.
+ANTHROPIC_MIN_THINKING_BUDGET_TOKENS = 1024
 REPLICATE_POLLING_DELAY_SECONDS = float(
    os.getenv("REPLICATE_POLLING_DELAY_SECONDS", 0.5)
 )
@ -419,9 +427,6 @@ CACHED_STREAMING_CHUNK_DELAY = float(os.getenv("CACHED_STREAMING_CHUNK_DELAY", 0
 AUDIO_SPEECH_CHUNK_SIZE = int(
    os.getenv("AUDIO_SPEECH_CHUNK_SIZE", 8192)
 )  # chunk_size for audio speech streaming. Balance between latency and memory usage
-MAX_SIZE_PER_ITEM_IN_MEMORY_CACHE_IN_KB = int(
-    os.getenv("MAX_SIZE_PER_ITEM_IN_MEMORY_CACHE_IN_KB", 512)
-)
 DEFAULT_MAX_TOKENS_FOR_TRITON = int(os.getenv("DEFAULT_MAX_TOKENS_FOR_TRITON", 2000))
 #### Networking settings ####
 # Sentinel used when `REQUEST_TIMEOUT` is unset: `litellm.request_timeout` keeps this
--- a/litellm/cost_calculator.py
+++ b/litellm/cost_calculator.py
@ -513,7 +513,10 @@ def cost_per_token(  # noqa: PLR0915
        return fireworks_ai_cost_per_token(model=model, usage=usage_block)
    elif custom_llm_provider == "azure":
        return azure_openai_cost_per_token(
-            model=model, usage=usage_block, response_time_ms=response_time_ms
+            model=model,
+            usage=usage_block,
+            response_time_ms=response_time_ms,
+            service_tier=service_tier,
        )
    elif custom_llm_provider == "gemini":
        return gemini_cost_per_token(
@ -539,6 +542,7 @@ def cost_per_token(  # noqa: PLR0915
            usage=usage_block,
            response_time_ms=response_time_ms,
            request_model=request_model,
+            service_tier=service_tier,
        )
    else:
        model_info = _cached_get_model_info_helper(
--- a/litellm/files/main.py
+++ b/litellm/files/main.py
@ -10,6 +10,7 @@ import contextvars
 import time
 import uuid as uuid_module
 from functools import partial
+from types import MappingProxyType
 from typing import Any, Coroutine, Dict, Literal, Optional, Union, cast

 import httpx
@ -85,6 +86,16 @@ bedrock_files_instance = BedrockFilesHandler()
 #################################################


+def _add_trusted_model_credentials_to_litellm_params(
+    litellm_params_dict: Dict[str, Any], kwargs: Dict[str, Any]
+) -> None:
+    trusted_model_credentials = kwargs.get("_litellm_internal_model_credentials")
+    if isinstance(trusted_model_credentials, type(MappingProxyType({}))):
+        litellm_params_dict["_litellm_internal_model_credentials"] = (
+            trusted_model_credentials
+        )
+
+
@client
 async def acreate_file(
    file: FileTypes,
@ -373,6 +384,10 @@ def file_retrieve(
            )
            if provider_config is not None:
                litellm_params_dict = get_litellm_params(**kwargs)
+                _add_trusted_model_credentials_to_litellm_params(
+                    litellm_params_dict=litellm_params_dict,
+                    kwargs=kwargs,
+                )
                litellm_params_dict["api_key"] = optional_params.api_key
                litellm_params_dict["api_base"] = optional_params.api_base

@ -497,6 +512,10 @@ def file_delete(
            pass
        optional_params = GenericLiteLLMParams(**kwargs)
        litellm_params_dict = get_litellm_params(**kwargs)
+        _add_trusted_model_credentials_to_litellm_params(
+            litellm_params_dict=litellm_params_dict,
+            kwargs=kwargs,
+        )
        ### TIMEOUT LOGIC ###
        timeout = optional_params.timeout or kwargs.get("request_timeout", 600) or 600
        # set timeout for 10 minutes by default
@ -846,6 +865,10 @@ def file_content(
    try:
        optional_params = GenericLiteLLMParams(**kwargs)
        litellm_params_dict = get_litellm_params(**kwargs)
+        _add_trusted_model_credentials_to_litellm_params(
+            litellm_params_dict=litellm_params_dict,
+            kwargs=kwargs,
+        )
        ### TIMEOUT LOGIC ###
        timeout = optional_params.timeout or kwargs.get("request_timeout", 600) or 600
        client = kwargs.get("client")
@ -993,6 +1016,7 @@ def file_content(
                vertex_location=vertex_ai_location,
                timeout=timeout,
                max_retries=optional_params.max_retries,
+                litellm_params=litellm_params_dict,
            )
        elif custom_llm_provider == "bedrock":
            response = bedrock_files_instance.file_content(
--- a/litellm/integrations/arize/_utils.py
+++ b/litellm/integrations/arize/_utils.py
@ -220,23 +220,57 @@ def _set_structured_outputs(span: "Span", response_obj, msg_attrs, span_attrs):
            safe_set_attribute(span, f"{prefix}.{msg_attrs.MESSAGE_ROLE}", message_role)


+def _safe_get(obj, key, default=None):
+    """Read ``key`` from a dict-like or Pydantic-model-like object.
+
+    The arize/langfuse_otel logger receives ``usage`` objects from many sources:
+    plain dicts, litellm ``Usage`` (which exposes ``.get``), and raw OpenAI
+    Pydantic models (e.g. ``openai.types.completion_usage.CompletionUsage`` and
+    nested ``CompletionTokensDetails`` / ``OutputTokensDetails``) which do NOT
+    expose ``.get``. Calling ``.get`` on the latter raised ``AttributeError`` —
+    see https://github.com/BerriAI/litellm/issues/13672.
+    """
+    if obj is None:
+        return default
+    getter = getattr(obj, "get", None)
+    if callable(getter):
+        try:
+            return getter(key, default)
+        except TypeError:
+            # Some objects expose `.get` with a different signature
+            pass
+    return getattr(obj, key, default)
+
+
 def _set_usage_outputs(span: "Span", response_obj, span_attrs):
    usage = response_obj and response_obj.get("usage")
    if not usage:
        return

    safe_set_attribute(
-        span, span_attrs.LLM_TOKEN_COUNT_TOTAL, usage.get("total_tokens")
+        span, span_attrs.LLM_TOKEN_COUNT_TOTAL, _safe_get(usage, "total_tokens")
+    )
+    completion_tokens = _safe_get(usage, "completion_tokens") or _safe_get(
+        usage, "output_tokens"
    )
-    completion_tokens = usage.get("completion_tokens") or usage.get("output_tokens")
    if completion_tokens:
        safe_set_attribute(
            span, span_attrs.LLM_TOKEN_COUNT_COMPLETION, completion_tokens
        )
-    prompt_tokens = usage.get("prompt_tokens") or usage.get("input_tokens")
+    prompt_tokens = _safe_get(usage, "prompt_tokens") or _safe_get(
+        usage, "input_tokens"
+    )
    if prompt_tokens:
        safe_set_attribute(span, span_attrs.LLM_TOKEN_COUNT_PROMPT, prompt_tokens)
-    reasoning_tokens = usage.get("output_tokens_details", {}).get("reasoning_tokens")
+
+    # Reasoning tokens live in `completion_tokens_details` for Chat Completions
+    # API (Usage) and in `output_tokens_details` for Responses API
+    # (ResponseAPIUsage). Both nested objects may be plain Pydantic models
+    # without `.get`.
+    token_details = _safe_get(usage, "completion_tokens_details") or _safe_get(
+        usage, "output_tokens_details"
+    )
+    reasoning_tokens = _safe_get(token_details, "reasoning_tokens")
    if reasoning_tokens:
        safe_set_attribute(
            span,
--- a/litellm/integrations/arize/arize_phoenix_client.py
+++ b/litellm/integrations/arize/arize_phoenix_client.py
@ -2,11 +2,23 @@
 Arize Phoenix API client for fetching prompt versions from Arize Phoenix.
 """

+import urllib.parse
 from typing import Any, Dict, Optional

 from litellm.llms.custom_httpx.http_handler import HTTPHandler


+def _sanitize_id(identifier: str) -> str:
+    """Reject path traversal characters and URL-encode the identifier."""
+    if any(c in identifier for c in ("/", "\\", "#", "?")):
+        raise ValueError(
+            f"Invalid identifier {identifier!r}: contains disallowed characters"
+        )
+    if ".." in identifier:
+        raise ValueError(f"Invalid identifier {identifier!r}: path traversal detected")
+    return urllib.parse.quote(identifier, safe="")
+
+
 class ArizePhoenixClient:
    """
    Client for interacting with Arize Phoenix API to fetch prompt versions.
@ -53,7 +65,8 @@ class ArizePhoenixClient:
        Returns:
            Dictionary containing prompt version data, or None if not found
        """
-        url = f"{self.api_base}/v1/prompt_versions/{prompt_version_id}"
+        safe_id = _sanitize_id(prompt_version_id)
+        url = f"{self.api_base}/v1/prompt_versions/{safe_id}"

        try:
            # Use the underlying httpx client directly to avoid query param extraction
--- a/litellm/integrations/arize/arize_phoenix_prompt_manager.py
+++ b/litellm/integrations/arize/arize_phoenix_prompt_manager.py
@ -5,7 +5,8 @@ Fetches prompt versions from Arize Phoenix and provides workspace-based access c

 from typing import Any, Dict, List, Optional, Tuple, Union

-from jinja2 import DictLoader, Environment, select_autoescape
+from jinja2 import DictLoader, select_autoescape
+from jinja2.sandbox import ImmutableSandboxedEnvironment

 from litellm.integrations.custom_prompt_management import CustomPromptManagement
 from litellm.integrations.prompt_management_base import (
@ -74,7 +75,13 @@ class ArizePhoenixTemplateManager:
            api_key=self.api_key, api_base=self.api_base
        )

-        self.jinja_env = Environment(
+        # Templates fetched from Arize Phoenix come from external workspace
+        # users; in a plain `Environment()` a malicious template could reach
+        # `__class__.__init__.__globals__` and execute arbitrary code on the
+        # proxy host. The sandbox blocks that attribute traversal while
+        # leaving normal `{{ var }}` substitution intact. Matches the
+        # dotprompt manager's hardening.
+        self.jinja_env = ImmutableSandboxedEnvironment(
            loader=DictLoader({}),
            autoescape=select_autoescape(["html", "xml"]),
            # Use Mustache/Handlebars-style delimiters
--- a/litellm/integrations/bitbucket/bitbucket_client.py
+++ b/litellm/integrations/bitbucket/bitbucket_client.py
@ -3,11 +3,27 @@ BitBucket API client for fetching .prompt files from BitBucket repositories.
 """

 import base64
+import urllib.parse
 from typing import Any, Dict, List, Optional

 from litellm.llms.custom_httpx.http_handler import HTTPHandler


+def _sanitize_file_path(file_path: str) -> str:
+    """Reject path traversal and URL-encode each path segment."""
+    if "#" in file_path or "?" in file_path:
+        raise ValueError(
+            f"Invalid file path {file_path!r}: contains URL special characters"
+        )
+    parts = file_path.split("/")
+    for part in parts:
+        if part == "..":
+            raise ValueError(
+                f"Invalid file path {file_path!r}: path traversal detected"
+            )
+    return "/".join(urllib.parse.quote(part, safe="") for part in parts)
+
+
 class BitBucketClient:
    """
    Client for interacting with BitBucket API to fetch .prompt files.
@ -72,7 +88,8 @@ class BitBucketClient:
        Returns:
            File content as string, or None if file not found
        """
-        url = f"{self.base_url}/repositories/{self.workspace}/{self.repository}/src/{self.branch}/{file_path}"
+        safe_path = _sanitize_file_path(file_path)
+        url = f"{self.base_url}/repositories/{self.workspace}/{self.repository}/src/{self.branch}/{safe_path}"

        try:
            response = self.http_handler.get(url, headers=self.headers)
@ -119,7 +136,8 @@ class BitBucketClient:
        Returns:
            List of file paths
        """
-        url = f"{self.base_url}/repositories/{self.workspace}/{self.repository}/src/{self.branch}/{directory_path}"
+        safe_dir = _sanitize_file_path(directory_path) if directory_path else ""
+        url = f"{self.base_url}/repositories/{self.workspace}/{self.repository}/src/{self.branch}/{safe_dir}"

        try:
            response = self.http_handler.get(url, headers=self.headers)
@ -211,7 +229,8 @@ class BitBucketClient:
        Returns:
            Dictionary containing file metadata, or None if file not found
        """
-        url = f"{self.base_url}/repositories/{self.workspace}/{self.repository}/src/{self.branch}/{file_path}"
+        safe_path = _sanitize_file_path(file_path)
+        url = f"{self.base_url}/repositories/{self.workspace}/{self.repository}/src/{self.branch}/{safe_path}"

        try:
            # Use GET with Range header to get just the headers (HEAD equivalent)
--- a/litellm/integrations/bitbucket/bitbucket_prompt_manager.py
+++ b/litellm/integrations/bitbucket/bitbucket_prompt_manager.py
@ -5,7 +5,8 @@ Fetches .prompt files from BitBucket repositories and provides team-based access

 from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union

-from jinja2 import DictLoader, Environment, select_autoescape
+from jinja2 import DictLoader, select_autoescape
+from jinja2.sandbox import ImmutableSandboxedEnvironment

 from litellm.integrations.custom_prompt_management import CustomPromptManagement

@ -74,7 +75,13 @@ class BitBucketTemplateManager:
        self.prompts: Dict[str, BitBucketPromptTemplate] = {}
        self.bitbucket_client = BitBucketClient(bitbucket_config)

-        self.jinja_env = Environment(
+        # Templates fetched from a BitBucket repo are not trustworthy:
+        # anyone with repo write access can ship Jinja syntax that, in a
+        # plain `Environment()`, would reach `__class__.__init__.__globals__`
+        # and pivot into RCE on the proxy host. The sandbox blocks that
+        # attribute traversal while leaving normal `{{ var }}` substitution
+        # intact. Matches the dotprompt manager's hardening.
+        self.jinja_env = ImmutableSandboxedEnvironment(
            loader=DictLoader({}),
            autoescape=select_autoescape(["html", "xml"]),
            # Use Handlebars-style delimiters to match Dotprompt spec
--- a/litellm/integrations/custom_sso_handler.py
+++ b/litellm/integrations/custom_sso_handler.py
@ -18,6 +18,17 @@ class CustomSSOLoginHandler(CustomLogger):
        self,
        request: Request,
    ) -> OpenID:
+        from litellm.proxy.auth.trusted_proxy_utils import (
+            require_trusted_proxy_request,
+        )
+        from litellm.proxy.proxy_server import general_settings
+
+        require_trusted_proxy_request(
+            request=request,
+            general_settings=general_settings,
+            feature_name="Custom UI SSO",
+        )
+
        request_headers_dict = dict(request.headers)
        return OpenID(
            id=request_headers_dict.get("x-litellm-user-id"),
--- a/litellm/integrations/gcs_bucket/gcs_bucket.py
+++ b/litellm/integrations/gcs_bucket/gcs_bucket.py
@ -6,12 +6,14 @@ import time
 from litellm._uuid import uuid
 from datetime import datetime, timedelta, timezone
 from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple
-from urllib.parse import quote

 from litellm._logging import verbose_logger
 from litellm.constants import LITELLM_ASYNCIO_QUEUE_MAXSIZE
 from litellm.integrations.additional_logging_utils import AdditionalLoggingUtils
 from litellm.integrations.gcs_bucket.gcs_bucket_base import GCSBucketBase
+from litellm.litellm_core_utils.cloud_storage_security import (
+    sanitize_cloud_object_component,
+)
 from litellm.proxy._types import CommonProxyErrors
 from litellm.types.integrations.base_health_check import IntegrationHealthCheckStatus
 from litellm.types.integrations.gcs_bucket import *
@ -335,7 +337,11 @@ class GCSBucketLogger(GCSBucketBase, AdditionalLoggingUtils):
        _litellm_params = kwargs.get("litellm_params", None) or {}
        _metadata = _litellm_params.get("metadata", None) or {}
        if "gcs_log_id" in _metadata:
-            object_name = _metadata["gcs_log_id"]
+            safe_log_id = sanitize_cloud_object_component(
+                _metadata.get("gcs_log_id"), fallback=""
+            )
+            if safe_log_id:
+                object_name = f"{current_date}/custom-{uuid.uuid4().hex}-{safe_log_id}"

        return object_name

@ -367,8 +373,7 @@ class GCSBucketLogger(GCSBucketBase, AdditionalLoggingUtils):
                    request_date_str=date_str,
                    response_id=request_id,
                )
-                encoded_object_name = quote(object_name, safe="")
-                response = await self.download_gcs_object(encoded_object_name)
+                response = await self.download_gcs_object(object_name)

                if response is not None:
                    loaded_response = json.loads(response)
--- a/litellm/integrations/gcs_bucket/gcs_bucket_base.py
+++ b/litellm/integrations/gcs_bucket/gcs_bucket_base.py
@ -11,6 +11,10 @@ from litellm.integrations.gcs_bucket.gcs_bucket_mock_client import (

 from litellm._logging import verbose_logger
 from litellm.integrations.custom_batch_logger import CustomBatchLogger
+from litellm.litellm_core_utils.cloud_storage_security import (
+    encode_gcs_object_name_for_url,
+    split_configured_cloud_bucket_name,
+)
 from litellm.llms.custom_httpx.http_handler import (
    get_async_httpx_client,
    httpxSpecialProvider,
@ -133,8 +137,8 @@ class GCSBucketBase(CustomBatchLogger):
            - Returns: bucket_name="my-bucket", object_name="my-folder/dev/my-object"

        """
-        if "/" in bucket_name:
-            bucket_name, prefix = bucket_name.split("/", 1)
+        bucket_name, prefix = split_configured_cloud_bucket_name(bucket_name)
+        if prefix:
            object_name = f"{prefix}/{object_name}"
            return bucket_name, object_name
        return bucket_name, object_name
@ -248,6 +252,7 @@ class GCSBucketBase(CustomBatchLogger):
                bucket_name=bucket_name,
                object_name=object_name,
            )
+            object_name = encode_gcs_object_name_for_url(object_name)

            url = f"https://storage.googleapis.com/storage/v1/b/{bucket_name}/o/{object_name}?alt=media"

@ -288,6 +293,7 @@ class GCSBucketBase(CustomBatchLogger):
                bucket_name=bucket_name,
                object_name=object_name,
            )
+            object_name = encode_gcs_object_name_for_url(object_name)

            url = f"https://storage.googleapis.com/storage/v1/b/{bucket_name}/o/{object_name}"

@ -334,10 +340,11 @@ class GCSBucketBase(CustomBatchLogger):
            bucket_name=bucket_name,
            object_name=object_name,
        )
+        encoded_object_name = encode_gcs_object_name_for_url(object_name)

        response = await self.async_httpx_client.post(
            headers=headers,
-            url=f"https://storage.googleapis.com/upload/storage/v1/b/{bucket_name}/o?uploadType=media&name={object_name}",
+            url=f"https://storage.googleapis.com/upload/storage/v1/b/{bucket_name}/o?uploadType=media&name={encoded_object_name}",
            data=json_logged_payload,
        )

--- a/litellm/integrations/gitlab/gitlab_prompt_manager.py
+++ b/litellm/integrations/gitlab/gitlab_prompt_manager.py
@ -4,7 +4,8 @@ GitLab prompt manager with configurable prompts folder.

 from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union

-from jinja2 import DictLoader, Environment, select_autoescape
+from jinja2 import DictLoader, select_autoescape
+from jinja2.sandbox import ImmutableSandboxedEnvironment

 from litellm.integrations.custom_prompt_management import CustomPromptManagement

@ -90,7 +91,13 @@ class GitLabTemplateManager:
            or ""
        ).strip("/")

-        self.jinja_env = Environment(
+        # Templates fetched from a GitLab repo are not trustworthy:
+        # anyone with repo write access can ship Jinja syntax that, in a
+        # plain `Environment()`, would reach `__class__.__init__.__globals__`
+        # and pivot into RCE on the proxy host. The sandbox blocks that
+        # attribute traversal while leaving normal `{{ var }}` substitution
+        # intact. Matches the dotprompt manager's hardening.
+        self.jinja_env = ImmutableSandboxedEnvironment(
            loader=DictLoader({}),
            autoescape=select_autoescape(["html", "xml"]),
            variable_start_string="{{",
--- a/litellm/integrations/langfuse/langfuse.py
+++ b/litellm/integrations/langfuse/langfuse.py
@ -90,6 +90,29 @@ def _extract_cache_read_input_tokens(usage_obj) -> int:
    return cache_read_input_tokens


+def resolve_langfuse_credentials(
+    langfuse_public_key=None,
+    langfuse_secret=None,
+    langfuse_secret_key=None,
+    langfuse_host=None,
+    allow_env_credentials: bool = True,
+):
+    if allow_env_credentials is False and langfuse_host is not None:
+        secret_key = langfuse_secret or langfuse_secret_key
+        public_key = langfuse_public_key
+    else:
+        secret_key = (
+            langfuse_secret or langfuse_secret_key or os.getenv("LANGFUSE_SECRET_KEY")
+        )
+        public_key = langfuse_public_key or os.getenv("LANGFUSE_PUBLIC_KEY")
+
+    resolved_host = langfuse_host or os.getenv(
+        "LANGFUSE_HOST", "https://cloud.langfuse.com"
+    )
+
+    return public_key, secret_key, resolved_host
+
+
 class LangFuseLogger:
    # Class variables or attributes
    def __init__(
@ -98,6 +121,7 @@ class LangFuseLogger:
        langfuse_secret=None,
        langfuse_host=None,
        flush_interval=1,
+        allow_env_credentials: bool = True,
    ):
        try:
            import langfuse
@ -106,11 +130,13 @@ class LangFuseLogger:
            raise Exception(
                f"\033[91mLangfuse not installed, try running 'pip install langfuse' to fix this error: {e}\n{traceback.format_exc()}\033[0m"
            )
-        # Instance variables
-        self.secret_key = langfuse_secret or os.getenv("LANGFUSE_SECRET_KEY")
-        self.public_key = langfuse_public_key or os.getenv("LANGFUSE_PUBLIC_KEY")
-        self.langfuse_host = langfuse_host or os.getenv(
-            "LANGFUSE_HOST", "https://cloud.langfuse.com"
+        self.public_key, self.secret_key, self.langfuse_host = (
+            resolve_langfuse_credentials(
+                langfuse_public_key=langfuse_public_key,
+                langfuse_secret=langfuse_secret,
+                langfuse_host=langfuse_host,
+                allow_env_credentials=allow_env_credentials,
+            )
        )
        if not (
            self.langfuse_host.startswith("http://")
@ -160,9 +186,10 @@ class LangFuseLogger:
                project_id = None

        if os.getenv("UPSTREAM_LANGFUSE_SECRET_KEY") is not None:
+            upstream_langfuse_debug_env = os.getenv("UPSTREAM_LANGFUSE_DEBUG")
            upstream_langfuse_debug = (
-                str_to_bool(self.upstream_langfuse_debug)
-                if self.upstream_langfuse_debug is not None
+                str_to_bool(upstream_langfuse_debug_env)
+                if upstream_langfuse_debug_env is not None
                else None
            )
            self.upstream_langfuse_secret_key = os.getenv(
@ -173,7 +200,7 @@ class LangFuseLogger:
            )
            self.upstream_langfuse_host = os.getenv("UPSTREAM_LANGFUSE_HOST")
            self.upstream_langfuse_release = os.getenv("UPSTREAM_LANGFUSE_RELEASE")
-            self.upstream_langfuse_debug = os.getenv("UPSTREAM_LANGFUSE_DEBUG")
+            self.upstream_langfuse_debug = upstream_langfuse_debug_env
            self.upstream_langfuse = Langfuse(
                public_key=self.upstream_langfuse_public_key,
                secret_key=self.upstream_langfuse_secret_key,
--- a/litellm/integrations/langfuse/langfuse_handler.py
+++ b/litellm/integrations/langfuse/langfuse_handler.py
@ -115,8 +115,10 @@ class LangFuseHandler:

        langfuse_logger = LangFuseLogger(
            langfuse_public_key=credentials.get("langfuse_public_key"),
-            langfuse_secret=credentials.get("langfuse_secret"),
+            langfuse_secret=credentials.get("langfuse_secret")
+            or credentials.get("langfuse_secret_key"),
            langfuse_host=credentials.get("langfuse_host"),
+            allow_env_credentials=credentials.get("langfuse_host") is None,
        )
        in_memory_dynamic_logger_cache.set_cache(
            credentials=credentials,
--- a/litellm/integrations/langfuse/langfuse_prompt_management.py
+++ b/litellm/integrations/langfuse/langfuse_prompt_management.py
@ -20,7 +20,7 @@ from ...litellm_core_utils.specialty_caches.dynamic_logging_cache import (
    DynamicLoggingCache,
 )
 from ..prompt_management_base import PromptManagementBase
-from .langfuse import LangFuseLogger
+from .langfuse import LangFuseLogger, resolve_langfuse_credentials
 from .langfuse_handler import LangFuseHandler

 if TYPE_CHECKING:
@ -46,6 +46,7 @@ def langfuse_client_init(
    langfuse_secret_key=None,
    langfuse_host=None,
    flush_interval=1,
+    allow_env_credentials: bool = True,
 ) -> LangfuseClass:
    """
    Initialize Langfuse client with caching to prevent multiple initializations.
@ -70,14 +71,12 @@ def langfuse_client_init(
            f"\033[91mLangfuse not installed, try running 'pip install langfuse' to fix this error: {e}\n\033[0m"
        )

-    # Instance variables
-
-    secret_key = (
-        langfuse_secret or langfuse_secret_key or os.getenv("LANGFUSE_SECRET_KEY")
-    )
-    public_key = langfuse_public_key or os.getenv("LANGFUSE_PUBLIC_KEY")
-    langfuse_host = langfuse_host or os.getenv(
-        "LANGFUSE_HOST", "https://cloud.langfuse.com"
+    public_key, secret_key, langfuse_host = resolve_langfuse_credentials(
+        langfuse_public_key=langfuse_public_key,
+        langfuse_secret=langfuse_secret,
+        langfuse_secret_key=langfuse_secret_key,
+        langfuse_host=langfuse_host,
+        allow_env_credentials=allow_env_credentials,
    )

    if not (
@ -222,6 +221,7 @@ class LangfusePromptManagement(LangFuseLogger, PromptManagementBase, CustomLogge
            langfuse_secret=dynamic_callback_params.get("langfuse_secret"),
            langfuse_secret_key=dynamic_callback_params.get("langfuse_secret_key"),
            langfuse_host=dynamic_callback_params.get("langfuse_host"),
+            allow_env_credentials=dynamic_callback_params.get("langfuse_host") is None,
        )
        langfuse_prompt_client = self._get_prompt_from_id(
            langfuse_prompt_id=prompt_id,
@ -246,6 +246,7 @@ class LangfusePromptManagement(LangFuseLogger, PromptManagementBase, CustomLogge
            langfuse_secret=dynamic_callback_params.get("langfuse_secret"),
            langfuse_secret_key=dynamic_callback_params.get("langfuse_secret_key"),
            langfuse_host=dynamic_callback_params.get("langfuse_host"),
+            allow_env_credentials=dynamic_callback_params.get("langfuse_host") is None,
        )
        langfuse_prompt_client = self._get_prompt_from_id(
            langfuse_prompt_id=prompt_id,
--- a/litellm/integrations/langsmith.py
+++ b/litellm/integrations/langsmith.py
@ -19,6 +19,7 @@ from litellm.integrations.langsmith_mock_client import (
    create_mock_langsmith_client,
    should_use_langsmith_mock,
 )
+from litellm.litellm_core_utils.redact_messages import redact_user_api_key_info
 from litellm.llms.custom_httpx.http_handler import (
    get_async_httpx_client,
    httpxSpecialProvider,
@ -112,17 +113,28 @@ class LangsmithLogger(CustomBatchLogger):
        langsmith_project: Optional[str] = None,
        langsmith_base_url: Optional[str] = None,
        langsmith_tenant_id: Optional[str] = None,
+        allow_env_credentials: bool = True,
    ) -> LangsmithCredentialsObject:
-        _credentials_api_key = langsmith_api_key or os.getenv("LANGSMITH_API_KEY")
-        _credentials_project = (
-            langsmith_project or os.getenv("LANGSMITH_PROJECT") or "litellm-completion"
-        )
-        _credentials_base_url = (
-            langsmith_base_url
-            or os.getenv("LANGSMITH_BASE_URL")
-            or "https://api.smith.langchain.com"
-        )
-        _credentials_tenant_id = langsmith_tenant_id or os.getenv("LANGSMITH_TENANT_ID")
+        if allow_env_credentials is False and langsmith_base_url is not None:
+            _credentials_api_key = langsmith_api_key
+            _credentials_project = langsmith_project or "litellm-completion"
+            _credentials_base_url = langsmith_base_url
+            _credentials_tenant_id = langsmith_tenant_id
+        else:
+            _credentials_api_key = langsmith_api_key or os.getenv("LANGSMITH_API_KEY")
+            _credentials_project = (
+                langsmith_project
+                or os.getenv("LANGSMITH_PROJECT")
+                or "litellm-completion"
+            )
+            _credentials_base_url = (
+                langsmith_base_url
+                or os.getenv("LANGSMITH_BASE_URL")
+                or "https://api.smith.langchain.com"
+            )
+            _credentials_tenant_id = langsmith_tenant_id or os.getenv(
+                "LANGSMITH_TENANT_ID"
+            )

        return LangsmithCredentialsObject(
            LANGSMITH_API_KEY=_credentials_api_key,
@ -153,6 +165,15 @@ class LangsmithLogger(CustomBatchLogger):
            for key in ("session_id", "thread_id", "conversation_id"):
                if key in requester_metadata and key not in extra_metadata:
                    extra_metadata[key] = requester_metadata[key]
+
+        # helper is shallow; also scrub nested requester_metadata since
+        # LangSmith forwards the whole dict into `extra`
+        extra_metadata = redact_user_api_key_info(metadata=extra_metadata)
+        nested = extra_metadata.get("requester_metadata")
+        if isinstance(nested, dict):
+            extra_metadata["requester_metadata"] = redact_user_api_key_info(
+                metadata=nested
+            )
        return extra_metadata

    def _build_outputs_with_usage(
@ -540,6 +561,10 @@ class LangsmithLogger(CustomBatchLogger):
                langsmith_tenant_id=standard_callback_dynamic_params.get(
                    "langsmith_tenant_id", None
                ),
+                allow_env_credentials=standard_callback_dynamic_params.get(
+                    "langsmith_base_url", None
+                )
+                is None,
            )
        else:
            credentials = self.default_credentials
--- a/litellm/integrations/opentelemetry.py
+++ b/litellm/integrations/opentelemetry.py
@ -69,6 +69,8 @@ class OpenTelemetryConfig:
    deployment_environment: Optional[str] = None
    model_id: Optional[str] = None
    ignore_context_propagation: Optional[bool] = None
+    # When True, create a private TracerProvider instead of reusing or setting the global one.
+    skip_set_global: bool = False

    def __post_init__(self) -> None:
        # If endpoint is specified but exporter is still the default "console",
@ -259,16 +261,21 @@ class OpenTelemetry(CustomLogger):
        try:
            existing_provider = get_existing_provider_fn()

-            # If a real SDK provider exists (set by another SDK like Langfuse), use it
-            # This uses a positive check for SDK providers instead of a negative check for proxy providers
            if isinstance(existing_provider, sdk_provider_class):
-                verbose_logger.debug(
-                    "OpenTelemetry: Using existing %s: %s",
-                    provider_name,
-                    type(existing_provider).__name__,
-                )
-                provider = existing_provider
-                # Don't call set_provider to preserve existing context
+                if skip_set_global:
+                    verbose_logger.debug(
+                        "OpenTelemetry: existing %s found but skip_set_global=True; creating private %s for isolation",
+                        provider_name,
+                        provider_name,
+                    )
+                    provider = create_new_provider_fn()
+                else:
+                    verbose_logger.debug(
+                        "OpenTelemetry: Using existing %s: %s",
+                        provider_name,
+                        type(existing_provider).__name__,
+                    )
+                    provider = existing_provider
            else:
                # Default proxy provider or unknown type, create our own
                verbose_logger.debug("OpenTelemetry: Creating new %s", provider_name)
@ -293,6 +300,12 @@ class OpenTelemetry(CustomLogger):

        return provider

+    def _skip_set_global(self) -> bool:
+        # langfuse_otel relies on the Langfuse SDK's providers; don't overwrite them.
+        return self.config.skip_set_global or (
+            hasattr(self, "callback_name") and self.callback_name == "langfuse_otel"
+        )
+
    def _init_tracing(self, tracer_provider):
        from opentelemetry import trace
        from opentelemetry.sdk.trace import TracerProvider
@ -303,11 +316,6 @@ class OpenTelemetry(CustomLogger):
            provider.add_span_processor(self._get_span_processor())
            return provider

-        # CRITICAL FIX: For Langfuse OTEL, skip setting global provider to prevent interference
-        skip_global = (
-            hasattr(self, "callback_name") and self.callback_name == "langfuse_otel"
-        )
-
        tracer_provider = self._get_or_create_provider(
            provider=tracer_provider,
            provider_name="TracerProvider",
@ -315,16 +323,18 @@ class OpenTelemetry(CustomLogger):
            sdk_provider_class=TracerProvider,
            create_new_provider_fn=create_tracer_provider,
            set_provider_fn=trace.set_tracer_provider,
-            skip_set_global=skip_global,
+            skip_set_global=self._skip_set_global(),
        )

        # Grab our tracer from the TracerProvider (not from global context)
        # This ensures we use the provided TracerProvider (e.g., for testing)
        self.tracer = tracer_provider.get_tracer(LITELLM_TRACER_NAME)
+        self._tracer_provider = tracer_provider
        self.span_kind = SpanKind

    def _init_metrics(self, meter_provider):
        if not self.config.enable_metrics:
+            self._meter_provider = None
            self._operation_duration_histogram = None
            self._token_usage_histogram = None
            self._cost_histogram = None
@ -350,7 +360,9 @@ class OpenTelemetry(CustomLogger):
            sdk_provider_class=MeterProvider,
            create_new_provider_fn=create_meter_provider,
            set_provider_fn=metrics.set_meter_provider,
+            skip_set_global=self._skip_set_global(),
        )
+        self._meter_provider = meter_provider

        meter = meter_provider.get_meter(__name__)

@ -388,6 +400,7 @@ class OpenTelemetry(CustomLogger):
    def _init_logs(self, logger_provider):
        # nothing to do if events disabled
        if not self.config.enable_events:
+            self._logger_provider = None
            return

        from opentelemetry._logs import get_logger_provider, set_logger_provider
@ -404,13 +417,14 @@ class OpenTelemetry(CustomLogger):
            )
            return provider

-        self._get_or_create_provider(
+        self._logger_provider = self._get_or_create_provider(
            provider=logger_provider,
            provider_name="LoggerProvider",
            get_existing_provider_fn=get_logger_provider,
            sdk_provider_class=OTLoggerProvider,
            create_new_provider_fn=create_logger_provider,
            set_provider_fn=set_logger_provider,
+            skip_set_global=self._skip_set_global(),
        )

    def log_success_event(self, kwargs, response_obj, start_time, end_time):
@ -1073,7 +1087,7 @@ class OpenTelemetry(CustomLogger):
        # See: https://github.com/open-telemetry/opentelemetry-python/pull/4676
        # TODO: Refactor to use the proper OTEL Logs API instead of directly creating SDK LogRecords

-        from opentelemetry._logs import SeverityNumber, get_logger
+        from opentelemetry._logs import SeverityNumber

        try:
            from opentelemetry.sdk._logs import (  # type: ignore[attr-defined]  # OTEL < 1.39.0
@ -1084,7 +1098,10 @@ class OpenTelemetry(CustomLogger):
                LogRecord as SdkLogRecord,  # type: ignore[attr-defined]  # OTEL >= 1.39.0
            )

-        otel_logger = get_logger(LITELLM_LOGGER_NAME)
+        # Resolve through the handler's own LoggerProvider (which may be a
+        # private one when skip_set_global=True) rather than the module-level
+        # get_logger() which always goes through the global provider.
+        otel_logger = self._logger_provider.get_logger(LITELLM_LOGGER_NAME)

        parent_ctx = span.get_span_context()
        provider = (kwargs.get("litellm_params") or {}).get(
--- a/litellm/integrations/prometheus.py
+++ b/litellm/integrations/prometheus.py
@ -265,6 +265,7 @@ class PrometheusLogger(CustomLogger):
            ########################################
            # LiteLLM Virtual API KEY metrics
            ########################################
+
            # Remaining MODEL RPM limit for API Key
            self.litellm_remaining_api_key_requests_for_model = self._gauge_factory(
                "litellm_remaining_api_key_requests_for_model",
@ -1928,7 +1929,7 @@ class PrometheusLogger(CustomLogger):
            or _litellm_params_metadata.get("user_agent"),
        }

-    def set_llm_deployment_failure_metrics(self, request_kwargs: dict):
+    def set_llm_deployment_failure_metrics(self, request_kwargs: dict):  # noqa: PLR0915
        """
        Sets Failure metrics when an LLM API call fails

@ -2006,17 +2007,32 @@ class PrometheusLogger(CustomLogger):
                    if code is not None:
                        exception_status = str(code)

-            # Create enum_values for the label factory (always create for use in different metrics)
+            # On LiteLLM-side rejects (no deployment picked), route request_kwargs["model"]
+            # into requested_model and leave deployment-scoped labels empty.
+            deployment_selected = bool(model_id)
+            if deployment_selected:
+                label_litellm_model_name = litellm_model_name
+                label_model_id = model_id
+                label_api_base = api_base
+                label_api_provider = llm_provider
+                label_requested_model = model_group or litellm_model_name
+            else:
+                label_litellm_model_name = ""
+                label_model_id = ""
+                label_api_base = ""
+                label_api_provider = ""
+                label_requested_model = litellm_model_name or model_group or ""
+
            enum_values = UserAPIKeyLabelValues(
-                litellm_model_name=litellm_model_name,
-                model_id=model_id,
-                api_base=api_base,
-                api_provider=llm_provider,
+                litellm_model_name=label_litellm_model_name,
+                model_id=label_model_id,
+                api_base=label_api_base,
+                api_provider=label_api_provider,
                exception_status=exception_status,
                exception_class=(
                    self._get_exception_class_name(exception) if exception else None
                ),
-                requested_model=model_group or litellm_model_name,
+                requested_model=label_requested_model,
                hashed_api_key=hashed_api_key,
                api_key_alias=api_key_alias,
                team=team,
@ -2030,12 +2046,14 @@ class PrometheusLogger(CustomLogger):
            log these labels
            ["litellm_model_name", "model_id", "api_base", "api_provider"]
            """
-            self.set_deployment_partial_outage(
-                litellm_model_name=litellm_model_name or "",
-                model_id=model_id,
-                api_base=api_base,
-                api_provider=llm_provider or "",
-            )
+            # Only mark a deployment outage when one was actually picked.
+            if deployment_selected:
+                self.set_deployment_partial_outage(
+                    litellm_model_name=litellm_model_name or "",
+                    model_id=model_id,
+                    api_base=api_base,
+                    api_provider=llm_provider or "",
+                )
            _deployment_label_ctx = PrometheusLabelFactoryContext(enum_values)
            if exception is not None:
                PrometheusLogger._inc_labeled_counter(
--- a/litellm/integrations/prometheus_helpers/prometheus_api.py
+++ b/litellm/integrations/prometheus_helpers/prometheus_api.py
@ -2,6 +2,7 @@
 Helper functions to query prometheus API
 """

+import json
 import time
 from datetime import datetime, timedelta
 from typing import Optional
@ -81,6 +82,24 @@ def is_prometheus_connected() -> bool:
    return False


+def _quote_promql_string_literal(value: str) -> str:
+    """Render ``value`` as a PromQL double-quoted string literal.
+
+    PromQL string literals follow Go's escape rules
+    (https://prometheus.io/docs/prometheus/latest/querying/basics/): a
+    backslash begins an escape sequence and a bare ``"`` ends the literal.
+    Without escaping, callers that accept arbitrary user-supplied values
+    (like the ``api_key`` filter on ``/global/spend/logs``) can inject extra
+    label matchers or selectors and read cross-tenant metrics.
+
+    JSON's quoting rules are a strict subset of Go's, so ``json.dumps`` of
+    a Python string produces a literal Prometheus accepts: ``\\``, ``\\"``,
+    and the standard ``\\n`` / ``\\t`` / ``\\uNNNN`` control-character
+    escapes. The returned value already includes the surrounding quotes.
+    """
+    return json.dumps(value, ensure_ascii=False)
+
+
 async def get_daily_spend_from_prometheus(api_key: Optional[str]):
    """
    Expected Response Format:
@ -109,8 +128,11 @@ async def get_daily_spend_from_prometheus(api_key: Optional[str]):
    if api_key is None:
        query = "sum(delta(litellm_spend_metric_total[1d]))"
    else:
+        quoted_api_key = _quote_promql_string_literal(api_key)
        query = (
-            f'sum(delta(litellm_spend_metric_total{{hashed_api_key="{api_key}"}}[1d]))'
+            "sum(delta(litellm_spend_metric_total{"
+            f"hashed_api_key={quoted_api_key}"
+            "}[1d]))"
        )

    params = {
--- a/litellm/litellm_core_utils/cli_token_utils.py
+++ b/litellm/litellm_core_utils/cli_token_utils.py
@ -31,15 +31,23 @@ def load_cli_token() -> Optional[dict]:
        return None


-def get_litellm_gateway_api_key() -> Optional[str]:
+def get_litellm_gateway_api_key(
+    expected_base_url: Optional[str] = None,
+) -> Optional[str]:
    """
    Get the stored CLI API key for use with LiteLLM SDK.

    This function reads the token file created by `litellm-proxy login`
    and returns the API key for use in Python scripts.

+    Args:
+        expected_base_url: When provided, the key is only returned if it was
+            originally issued for this URL. Pass the target server URL to
+            prevent credential leakage when the client is pointed at a
+            different (possibly malicious) server.
+
    Returns:
-        str: The API key if found, None otherwise
+        str: The API key if found (and origin matches), None otherwise

    Example:
        >>> import litellm
@ -53,6 +61,10 @@ def get_litellm_gateway_api_key() -> Optional[str]:
        >>>     )
    """
    token_data = load_cli_token()
-    if token_data and "key" in token_data:
-        return token_data["key"]
-    return None
+    if not token_data or "key" not in token_data:
+        return None
+    if expected_base_url is not None:
+        stored_url = token_data.get("base_url")
+        if stored_url != expected_base_url.rstrip("/"):
+            return None
+    return token_data["key"]
--- a/litellm/litellm_core_utils/cloud_storage_security.py
+++ b/litellm/litellm_core_utils/cloud_storage_security.py
@ -0,0 +1,175 @@
+import posixpath
+import re
+from types import MappingProxyType
+from typing import Any, Mapping, Optional, Sequence, Tuple, cast
+from urllib.parse import quote, unquote
+
+from litellm._uuid import uuid
+
+VERTEX_AI_MANAGED_GCS_PREFIX = "litellm-vertex-files/"
+BEDROCK_MANAGED_S3_BATCH_PREFIX = "litellm-bedrock-files-"
+BEDROCK_MANAGED_S3_UPLOAD_PREFIX = "litellm-bedrock-files/"
+BEDROCK_MANAGED_S3_OUTPUT_PREFIX = "litellm-batch-outputs/"
+BEDROCK_MANAGED_S3_PREFIXES = (
+    BEDROCK_MANAGED_S3_BATCH_PREFIX,
+    BEDROCK_MANAGED_S3_UPLOAD_PREFIX,
+    BEDROCK_MANAGED_S3_OUTPUT_PREFIX,
+)
+_MAPPING_PROXY_TYPE: type = type(MappingProxyType({}))
+
+_SAFE_OBJECT_COMPONENT_PATTERN = re.compile(r"[^A-Za-z0-9._-]+")
+
+
+def sanitize_cloud_object_component(
+    value: Optional[str], fallback: str = "file"
+) -> str:
+    if not isinstance(value, str):
+        return fallback
+
+    component = posixpath.basename(value.replace("\\", "/")).strip()
+    if component in {"", ".", ".."}:
+        return fallback
+
+    component = "".join(
+        "_" if ord(char) < 32 or ord(char) == 127 else char for char in component
+    )
+    component = _SAFE_OBJECT_COMPONENT_PATTERN.sub("_", component)
+    component = component.strip("._")
+    if not component:
+        return fallback
+    return component[:255]
+
+
+def sanitize_cloud_object_path(value: Optional[str], fallback: str = "file") -> str:
+    if not isinstance(value, str):
+        return fallback
+
+    segments = []
+    for segment in value.replace("\\", "/").split("/"):
+        sanitized_segment = sanitize_cloud_object_component(segment, fallback="")
+        if sanitized_segment:
+            segments.append(sanitized_segment)
+
+    if not segments:
+        return fallback
+    return "/".join(segments)
+
+
+def build_managed_cloud_object_name(
+    prefix: str, filename: Optional[str], fallback_filename: str = "file"
+) -> str:
+    safe_filename = sanitize_cloud_object_component(
+        filename, fallback=fallback_filename
+    )
+    return f"{prefix}{uuid.uuid4().hex}-{safe_filename}"
+
+
+def _validate_cloud_object_path(object_name: str) -> None:
+    if not object_name:
+        raise ValueError("Cloud storage object name is required")
+    if object_name.startswith("/"):
+        raise ValueError("Cloud storage object name must be relative")
+    if any(ord(char) < 32 or ord(char) == 127 for char in object_name):
+        raise ValueError("Cloud storage object name contains control characters")
+    segments = object_name.split("/")
+    if any(segment in {".", ".."} for segment in segments):
+        raise ValueError("Cloud storage object name contains an invalid path segment")
+    if "" in segments[:-1]:
+        raise ValueError("Cloud storage object name contains an invalid path segment")
+
+
+def split_configured_cloud_bucket_name(bucket_name: str) -> Tuple[str, str]:
+    if not isinstance(bucket_name, str) or not bucket_name.strip():
+        raise ValueError("Cloud storage bucket name is required")
+
+    bucket_name = bucket_name.strip()
+    if "://" in bucket_name or "?" in bucket_name or "#" in bucket_name:
+        raise ValueError(
+            "Cloud storage bucket name must not include a URI scheme or query"
+        )
+    if any(ord(char) < 32 or ord(char) == 127 for char in bucket_name):
+        raise ValueError("Cloud storage bucket name contains control characters")
+
+    bucket, _, prefix = bucket_name.partition("/")
+    if not bucket:
+        raise ValueError("Cloud storage bucket name is required")
+    if "\\" in bucket:
+        raise ValueError("Cloud storage bucket name contains an invalid separator")
+
+    prefix = prefix.strip("/")
+    if prefix:
+        _validate_cloud_object_path(prefix)
+
+    return bucket, prefix
+
+
+def encode_gcs_object_name_for_url(object_name: str) -> str:
+    return quote(unquote(object_name), safe="")
+
+
+def encode_s3_object_key_for_url(object_key: str) -> str:
+    return quote(unquote(object_key), safe="/")
+
+
+def should_allow_legacy_cloud_file_ids(
+    litellm_params: Optional[Mapping[str, Any]] = None,
+) -> bool:
+    value = None
+    if isinstance(litellm_params, Mapping):
+        trusted_model_credentials = litellm_params.get(
+            "_litellm_internal_model_credentials"
+        )
+        if isinstance(trusted_model_credentials, _MAPPING_PROXY_TYPE):
+            value = cast(Mapping[str, Any], trusted_model_credentials).get(
+                "allow_legacy_cloud_file_ids"
+            )
+
+    if isinstance(value, bool):
+        return value
+    if isinstance(value, str):
+        return value.strip().lower() in {"1", "true", "yes", "on"}
+    return False
+
+
+def validate_managed_cloud_file_id(
+    file_id: str,
+    scheme: str,
+    configured_bucket_name: str,
+    allowed_object_prefixes: Sequence[str],
+    allow_legacy_cloud_file_ids: bool = False,
+) -> Tuple[str, str]:
+    decoded_file_id = unquote(file_id)
+    if not decoded_file_id.startswith(scheme):
+        raise ValueError(f"file_id must be a {scheme} URI")
+
+    full_path = decoded_file_id[len(scheme) :]
+    if "/" not in full_path:
+        raise ValueError("file_id must include a cloud storage object name")
+
+    bucket_name, object_name = full_path.split("/", 1)
+    configured_bucket, configured_prefix = split_configured_cloud_bucket_name(
+        configured_bucket_name
+    )
+    if bucket_name != configured_bucket:
+        raise ValueError("file_id bucket does not match the configured storage bucket")
+
+    _validate_cloud_object_path(object_name)
+    allowed_prefixes = tuple(allowed_object_prefixes)
+    if configured_prefix:
+        allowed_prefixes = tuple(
+            f"{configured_prefix.rstrip('/')}/{prefix}" for prefix in allowed_prefixes
+        )
+
+    if object_name.startswith(allowed_prefixes):
+        return bucket_name, object_name
+
+    if allow_legacy_cloud_file_ids:
+        if configured_prefix and not object_name.startswith(
+            f"{configured_prefix.rstrip('/')}/"
+        ):
+            raise ValueError(
+                "file_id object does not match the configured storage prefix"
+            )
+        return bucket_name, object_name
+
+    raise ValueError("file_id must reference a LiteLLM-managed storage object")
--- a/litellm/litellm_core_utils/exception_mapping_utils.py
+++ b/litellm/litellm_core_utils/exception_mapping_utils.py
@ -6,7 +6,8 @@ from typing import Any, Optional
 import httpx

 import litellm
-from litellm._logging import _redact_string, verbose_logger
+from litellm._logging import _ENABLE_SECRET_REDACTION, _redact_string, verbose_logger
+from litellm.litellm_core_utils.secret_redaction import redact_string
 from litellm.types.utils import LlmProviders

 from ..exceptions import (
@ -261,10 +262,18 @@ def exception_type(  # type: ignore  # noqa: PLR0915
        original_exception=original_exception
    )
    try:
-        error_str = str(original_exception)
+        error_str = (
+            redact_string(str(original_exception))
+            if _ENABLE_SECRET_REDACTION
+            else str(original_exception)
+        )
        if model:
            if hasattr(original_exception, "message"):
-                error_str = str(original_exception.message)
+                error_str = (
+                    redact_string(str(original_exception.message))
+                    if _ENABLE_SECRET_REDACTION
+                    else str(original_exception.message)
+                )
            if isinstance(original_exception, BaseException):
                exception_type = type(original_exception).__name__
            else:
@ -2431,7 +2440,8 @@ def exception_type(  # type: ignore  # noqa: PLR0915
            else:
                raise APIConnectionError(
                    message="{}\n{}".format(
-                        str(original_exception), _redact_string(traceback.format_exc())
+                        str(original_exception),
+                        _redact_string(traceback.format_exc()),
                    ),
                    llm_provider=custom_llm_provider,
                    model=model,
@ -2461,7 +2471,8 @@ def exception_type(  # type: ignore  # noqa: PLR0915
                    raise e  # it's already mapped
            raised_exc = APIConnectionError(
                message="{}\n{}".format(
-                    original_exception, _redact_string(traceback.format_exc())
+                    original_exception,
+                    _redact_string(traceback.format_exc()),
                ),
                llm_provider="",
                model="",
--- a/litellm/litellm_core_utils/initialize_dynamic_callback_params.py
+++ b/litellm/litellm_core_utils/initialize_dynamic_callback_params.py
@ -37,8 +37,6 @@ _supported_callback_params = [
    "langfuse_secret_key",
    "langfuse_host",
    "langfuse_prompt_version",
-    "gcs_bucket_name",
-    "gcs_path_service_account",
    "langsmith_api_key",
    "langsmith_project",
    "langsmith_base_url",
@ -57,6 +55,11 @@ _supported_callback_params = [
    "lunary_public_key",
 ]

+_request_blocked_callback_params = {
+    "gcs_bucket_name",
+    "gcs_path_service_account",
+}
+

 def initialize_standard_callback_dynamic_params(
    kwargs: Optional[Dict] = None,
@ -64,13 +67,15 @@ def initialize_standard_callback_dynamic_params(
    """
    Initialize the standard callback dynamic params from the kwargs

-    checks if langfuse_secret_key, gcs_bucket_name in kwargs and sets the corresponding attributes in StandardCallbackDynamicParams
+    checks supported request callback params in kwargs and sets the corresponding attributes in StandardCallbackDynamicParams
    """

    standard_callback_dynamic_params = StandardCallbackDynamicParams()
    if kwargs:
        # 1. Check top-level kwargs
        for param in _supported_callback_params:
+            if param in _request_blocked_callback_params:
+                continue
            if param in kwargs:
                _param_value = kwargs.get(param)
                validate_no_callback_env_reference(
@ -86,6 +91,8 @@ def initialize_standard_callback_dynamic_params(

        if isinstance(metadata, dict):
            for param in _supported_callback_params:
+                if param in _request_blocked_callback_params:
+                    continue
                if param not in standard_callback_dynamic_params and param in metadata:
                    _param_value = metadata.get(param)
                    validate_no_callback_env_reference(
--- a/litellm/litellm_core_utils/litellm_logging.py
+++ b/litellm/litellm_core_utils/litellm_logging.py
@ -3242,10 +3242,15 @@ class Logging(LiteLLMLoggingBaseClass):
                    ),
                    langfuse_secret=self.standard_callback_dynamic_params.get(
                        "langfuse_secret"
-                    ),
+                    )
+                    or self.standard_callback_dynamic_params.get("langfuse_secret_key"),
                    langfuse_host=self.standard_callback_dynamic_params.get(
                        "langfuse_host"
                    ),
+                    allow_env_credentials=self.standard_callback_dynamic_params.get(
+                        "langfuse_host"
+                    )
+                    is None,
                )
            return langFuseLogger

@ -4720,7 +4725,7 @@ class StandardLoggingPayloadSetup:
        ):
            for key, value in litellm_params["metadata"].items():
                # Skip non-serializable objects like UserAPIKeyAuth
-                if key == "user_api_key_auth":
+                if key in {"user_api_key_auth", "user_api_key_budget_reservation"}:
                    continue
                merged_metadata[key] = value

--- a/litellm/litellm_core_utils/llm_request_utils.py
+++ b/litellm/litellm_core_utils/llm_request_utils.py
@ -77,8 +77,8 @@ def get_proxy_server_request_headers(litellm_params: Optional[dict]) -> dict:
    if litellm_params is None:
        return {}

-    proxy_request_headers = (
-        litellm_params.get("proxy_server_request", {}).get("headers", {}) or {}
-    )
+    proxy_request_headers = (litellm_params.get("proxy_server_request") or {}).get(
+        "headers"
+    ) or {}

    return proxy_request_headers
--- a/litellm/litellm_core_utils/prompt_templates/factory.py
+++ b/litellm/litellm_core_utils/prompt_templates/factory.py
@ -4582,6 +4582,11 @@ class BedrockConverseMessagesProcessor:
                                    message=cast(ChatCompletionFileObject, element)
                                )
                                _parts.append(_part)
+                            elif element["type"] == "document":
+                                _part = BedrockConverseMessagesProcessor._process_document_message(
+                                    element
+                                )
+                                _parts.append(_part)
                            _cache_point_block = (
                                litellm.AmazonConverseConfig()._get_cache_point_block(
                                    message_block=cast(
@ -4864,6 +4869,44 @@ class BedrockConverseMessagesProcessor:
            image_url=cast(str, file_id or file_data), format=format
        )

+    @staticmethod
+    def _process_document_message(element: dict) -> BedrockContentBlock:
+        """Convert a document content block to a Bedrock DocumentBlock.
+
+        Handles the Anthropic-style document format:
+        {"type": "document", "source": {"type": "base64", "media_type": "application/pdf", "data": "..."}}
+        """
+        source = element["source"]
+        source_type = source.get("type")
+        if source_type != "base64":
+            raise ValueError(
+                f"Bedrock Converse only supports base64-encoded document sources, got '{source_type}'. "
+                "Please convert the document to base64 before sending to Bedrock."
+            )
+        media_type: str = source["media_type"]
+        data: str = source["data"]
+        doc_format = BedrockImageProcessor._validate_format(
+            mime_type=media_type, image_format=media_type.split("/")[1]
+        )
+
+        # Deterministic name using the same hashing pattern as _create_bedrock_block
+        HASH_SAMPLE_BYTES = 64 * 1024
+        normalized = "".join(data.split()).encode("utf-8")
+        sample = normalized[:HASH_SAMPLE_BYTES]
+        hasher = hashlib.sha256()
+        hasher.update(sample)
+        hasher.update(str(len(normalized)).encode("utf-8"))
+        content_hash = hasher.hexdigest()[:16]
+        document_name = f"Document_{content_hash}_{doc_format}"
+
+        return BedrockContentBlock(
+            document=BedrockDocumentBlock(
+                source=BedrockSourceBlock(bytes=data),
+                format=doc_format,
+                name=document_name,
+            )
+        )
+
    @staticmethod
    def add_thinking_blocks_to_assistant_content(
        thinking_blocks: List[BedrockContentBlock],
@ -4961,6 +5004,11 @@ def _bedrock_converse_messages_pt(  # noqa: PLR0915
                                )
                            )
                            _parts.append(_part)
+                        elif element["type"] == "document":
+                            _part = BedrockConverseMessagesProcessor._process_document_message(
+                                element
+                            )
+                            _parts.append(_part)
                        _cache_point_block = (
                            litellm.AmazonConverseConfig()._get_cache_point_block(
                                message_block=cast(
--- a/litellm/litellm_core_utils/secret_redaction.py
+++ b/litellm/litellm_core_utils/secret_redaction.py
@ -0,0 +1,81 @@
+"""
+Credential/secret redaction utilities.
+
+This module owns the compiled regex and the public `redact_string` helper so
+that any part of the codebase (logging, exception mapping, etc.) can scrub
+secrets from strings without depending on the logging-configuration module.
+"""
+
+import re
+from typing import List
+
+_REDACTED = "REDACTED"
+
+
+def _build_secret_patterns() -> "re.Pattern[str]":
+    patterns: List[str] = [
+        # PEM private key / certificate blocks
+        r"-----BEGIN[A-Z \-]*PRIVATE KEY-----[\s\S]*?-----END[A-Z \-]*PRIVATE KEY-----",
+        # GCP OAuth2 access tokens (ya29.*)
+        r"\bya29\.[A-Za-z0-9_.~+/-]+",
+        # Credential %s formatting (space separator, no key= prefix)
+        r"(?:client_secret|azure_password|azure_username)\s+[^\s,'\"})\]{}>]+",
+        # AWS access key IDs
+        r"(?:AKIA|ASIA)[0-9A-Z]{16}",
+        # AWS secrets / session tokens / access key IDs (key=value)
+        r"(?:aws_secret_access_key|aws_session_token|aws_access_key_id)"
+        r"\s*[:=]\s*[A-Za-z0-9/+=]{20,}",
+        # Bearer tokens (OAuth, JWT, etc.)
+        r"Bearer\s+[A-Za-z0-9\-._~+/]{10,}=*",
+        # Basic auth headers
+        r"Basic\s+[A-Za-z0-9+/]{10,}={0,2}",
+        # OpenAI / Anthropic sk- prefixed keys
+        r"sk-[A-Za-z0-9\-_]{20,}",
+        # Generic api_key / api-key / apikey (handles 'key': 'value' dict repr)
+        r"(?:api[_-]?key)['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]{8,}",
+        # x-api-key / api-key header values (handles 'key': 'value' dict repr)
+        r"(?:x-api-key|api-key)['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]+",
+        # Anthropic internal header keys
+        r"x-ak-[A-Za-z0-9\-_]{20,}",
+        # Google API keys (bare key value)
+        r"AIza[0-9A-Za-z\-_]{35}",
+        # URL query-param key=VALUE (e.g. ?key=AIza... or &key=...) — catches the
+        # full "key=<secret>" fragment so the value is redacted regardless of format.
+        r"(?<=[?&])key=[^\s&'\"]{8,}",
+        # Password / secret params (handles key=value and 'key': 'value')
+        # Word boundary prevents O(n^2) backtracking on long word-char runs.
+        r"(?:^|(?<=\W))\w*(?:password|passwd|client_secret|secret_key|_secret)"
+        r"['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]+",
+        # Database connection string credentials (scheme://user:pass@host)
+        r"(?<=://)[^\s'\"]*:[^\s'\"@]+(?=@)",
+        # Databricks personal access tokens
+        r"dapi[0-9a-f]{32}",
+        # ── Key-name-based redaction ──
+        # Catches secrets inside dicts/config dumps by matching on the KEY name
+        # regardless of what the value looks like.
+        # e.g. 'master_key': 'any-value-here', "database_url": "postgres://..."
+        # private_key with PEM-aware value capture
+        r"""private_key['\"]?\s*[:=]\s*['\"]?(?:-----BEGIN[A-Z \-]*PRIVATE KEY-----[\s\S]*?-----END[A-Z \-]*PRIVATE KEY-----|[^\s,'\"})\]{}>]+)""",
+        r"(?:master_key|database_url|db_url|connection_string|"
+        r"signing_key|encryption_key|"
+        r"auth_token|access_token|refresh_token|"
+        r"slack_webhook_url|webhook_url|"
+        r"database_connection_string|"
+        r"huggingface_token|jwt_secret)"
+        r"""['\"]?\s*[:=]\s*['\"]?[^\s,'\"})\]{}>]+""",
+        # Raw JWTs (without Bearer prefix)
+        r"\beyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]*",
+        # Azure SAS tokens in URLs
+        r"[?&]sig=[A-Za-z0-9%+/=]+",
+        # Full JSON service-account blobs (single-line and multi-line)
+        r'\{[^{}]*"type"\s*:\s*"service_account"[^{}]*(?:\{[^{}]*\}[^{}]*)*\}',
+    ]
+    return re.compile("|".join(patterns), re.IGNORECASE)
+
+
+_SECRET_RE = _build_secret_patterns()
+
+
+def redact_string(value: str) -> str:
+    """Scrub known secret/credential patterns from *value* and return the result."""
+    return _SECRET_RE.sub(_REDACTED, value)
--- a/litellm/litellm_core_utils/streaming_handler.py
+++ b/litellm/litellm_core_utils/streaming_handler.py
@ -2244,7 +2244,7 @@ class CustomStreamWrapper:
                asyncio.create_task(
                    self.logging_obj.async_failure_handler(e, traceback_exception)
                )
-            raise e
+            self._handle_stream_fallback_error(e)
        except Exception as e:
            traceback_exception = traceback.format_exc()
            if self.logging_obj is not None:
--- a/litellm/litellm_core_utils/url_utils.py
+++ b/litellm/litellm_core_utils/url_utils.py
@ -21,8 +21,8 @@ Admins can opt out via two ``litellm`` globals (wired from proxy config):

 import socket
 from ipaddress import ip_address, ip_network
-from typing import Any, List, Set, Tuple
-from urllib.parse import urlparse, urlunparse
+from typing import Any, List, Optional, Set, Tuple
+from urllib.parse import quote, urlparse, urlunparse

 import httpx

@ -46,6 +46,46 @@ class SSRFError(ValueError):
    pass


+def encode_url_path_segment(value: Any, *, field_name: str = "path parameter") -> str:
+    """Percent-encode one user-controlled URL path segment.
+
+    ``urllib.parse.quote(..., safe="")`` intentionally leaves RFC 3986
+    unreserved characters such as ``.`` unescaped, so reject standalone dot
+    segments before they can be appended to an upstream URL and normalized by
+    the HTTP client.
+    """
+    if value is None:
+        raise ValueError(f"{field_name} is required")
+
+    value_str = str(value)
+    if value_str == "":
+        raise ValueError(f"{field_name} is required")
+    if value_str in {".", ".."}:
+        raise ValueError(f"{field_name} cannot be a dot path segment")
+
+    return quote(value_str, safe="")
+
+
+def encode_url_path_segments(value: Any, *, field_name: str = "path") -> str:
+    """Percent-encode a user-controlled URL path made of multiple segments.
+
+    Empty segments are rejected, so leading, trailing, or consecutive slashes
+    fail closed instead of being normalized by the HTTP client.
+    """
+    if value is None:
+        raise ValueError(f"{field_name} is required")
+
+    value_str = str(value)
+    if value_str == "":
+        raise ValueError(f"{field_name} is required")
+
+    encoded_segments = []
+    for segment in value_str.split("/"):
+        encoded_segments.append(encode_url_path_segment(segment, field_name=field_name))
+
+    return "/".join(encoded_segments)
+
+
 def _is_blocked_ip(addr: str) -> bool:
    """Return True for any IP not safe to reach from a user-supplied URL.

@ -70,6 +110,85 @@ def _normalize_host(host: str) -> str:
    return host.lower().rstrip(".")


+def _default_port_for_scheme(scheme: str) -> int:
+    return 443 if scheme == "https" else 80
+
+
+def _parse_url_destination_allowlist_entry(
+    entry: str,
+) -> Optional[Tuple[str, Optional[str], Optional[int]]]:
+    """Parse an admin allowlist entry into host, optional scheme, optional port.
+
+    Entries may be bare hosts (``api.example.com``), host+port
+    (``api.example.com:8443``), or origins (``https://api.example.com``).
+    URL paths are intentionally ignored so admins can paste an api_base value.
+    """
+    entry = entry.strip()
+    if not entry:
+        return None
+
+    has_scheme = "://" in entry
+    parsed = urlparse(entry if has_scheme else f"//{entry}")
+    if has_scheme and parsed.scheme not in _ALLOWED_SCHEMES:
+        return None
+    if parsed.username is not None or parsed.password is not None:
+        return None
+    if not parsed.hostname:
+        return None
+
+    try:
+        port = parsed.port
+    except ValueError:
+        return None
+
+    scheme: Optional[str] = parsed.scheme if has_scheme else None
+    if scheme is not None and port is None:
+        port = _default_port_for_scheme(scheme)
+
+    return _normalize_host(parsed.hostname), scheme, port
+
+
+def is_url_destination_allowed_by_host(url: str, allowed_hosts: List[str]) -> bool:
+    """Return True when a credential-bearing provider URL is admin-allowlisted.
+
+    This does not fetch, resolve, or rewrite URLs. It only answers whether the
+    destination origin is explicitly trusted by configuration. Use ``safe_get``
+    for user-controlled content fetches that require SSRF protection.
+    """
+    parsed = urlparse(url)
+    if parsed.scheme not in _ALLOWED_SCHEMES:
+        return False
+    if parsed.username is not None or parsed.password is not None:
+        return False
+    if not parsed.hostname:
+        return False
+
+    try:
+        effective_port = parsed.port or _default_port_for_scheme(parsed.scheme)
+    except ValueError:
+        return False
+
+    normalized_host = _normalize_host(parsed.hostname)
+    configured_entries = (
+        [allowed_hosts] if isinstance(allowed_hosts, str) else allowed_hosts
+    )
+    for entry in configured_entries or []:
+        if not isinstance(entry, str):
+            continue
+        parsed_entry = _parse_url_destination_allowlist_entry(entry)
+        if parsed_entry is None:
+            continue
+        allowed_host, allowed_scheme, allowed_port = parsed_entry
+        if allowed_host != normalized_host:
+            continue
+        if allowed_scheme is not None and allowed_scheme != parsed.scheme:
+            continue
+        if allowed_port is not None and allowed_port != effective_port:
+            continue
+        return True
+    return False
+
+
 def _format_host_header(hostname: str, port: int, default_port: int) -> str:
    """Build an RFC 7230 Host header value, bracketing IPv6 literals."""
    bracketed = f"[{hostname}]" if ":" in hostname else hostname
@ -145,7 +264,7 @@ def validate_url(url: str) -> Tuple[str, str]:
        raise SSRFError("URL has no hostname")

    port = parsed.port
-    default_port = 443 if parsed.scheme == "https" else 80
+    default_port = _default_port_for_scheme(parsed.scheme)
    effective_port = port if port is not None else default_port
    host_header = _format_host_header(hostname, effective_port, default_port)

@ -199,13 +318,54 @@ def validate_url(url: str) -> Tuple[str, str]:
    return rewritten, host_header


+def assert_same_origin(candidate_url: str, expected_url: str) -> None:
+    """Verify ``candidate_url`` shares scheme, host, and port with ``expected_url``.
+
+    Use when an upstream API returns a URL meant for follow-up requests
+    (e.g. an async-job polling URL that will be hit with the operator's
+    API key in the headers). The upstream is trusted because the operator
+    configured ``api_base``, but the URL it hands back must actually point
+    back at the same origin or we'd be blindly forwarding credentials
+    wherever the upstream told us to.
+
+    Hostnames are compared case-insensitively. Default ports are made
+    explicit (HTTP→80, HTTPS→443) so ``https://api.example.com:443/...``
+    and ``https://api.example.com/...`` are treated as the same origin.
+
+    Error messages identify *which* component mismatched but never echo
+    the operator's ``expected`` host or the candidate's hostname back to
+    the caller — in the SSRF threat model the caller is the attacker,
+    and reflecting host info would be a secondary leak of operator
+    infrastructure details.
+    """
+    candidate = urlparse(candidate_url)
+    expected = urlparse(expected_url)
+
+    if candidate.scheme not in _ALLOWED_SCHEMES:
+        raise SSRFError("URL scheme is not allowed")
+
+    if candidate.scheme != expected.scheme:
+        raise SSRFError("Origin mismatch on scheme")
+
+    candidate_host = _normalize_host(candidate.hostname or "")
+    expected_host = _normalize_host(expected.hostname or "")
+    if not candidate_host or candidate_host != expected_host:
+        raise SSRFError("Origin mismatch on host")
+
+    default_port = 443 if candidate.scheme == "https" else 80
+    candidate_port = candidate.port if candidate.port is not None else default_port
+    expected_port = expected.port if expected.port is not None else default_port
+    if candidate_port != expected_port:
+        raise SSRFError("Origin mismatch on port")
+
+
 _MAX_REDIRECTS = 10


 def _extract_redirect_url(response: Any, request_url: str) -> str:
    """Extract and resolve the redirect target from a response's Location header."""
    location = response.headers.get("location")
-    if not location:
+    if not isinstance(location, str) or not location:
        raise SSRFError("Redirect response has no Location header")
    # Resolve relative URLs against the request URL
    return str(httpx.URL(request_url).join(location))
--- a/litellm/llms/anthropic/batches/transformation.py
+++ b/litellm/llms/anthropic/batches/transformation.py
@ -5,6 +5,7 @@ from typing import TYPE_CHECKING, Any, Dict, List, Literal, Optional, Union, cas
 import httpx
 from httpx import Headers, Response

+from litellm.litellm_core_utils.url_utils import encode_url_path_segment
 from litellm.llms.base_llm.batches.transformation import BaseBatchesConfig
 from litellm.llms.base_llm.chat.transformation import BaseLLMException
 from litellm.types.llms.openai import AllMessageValues, CreateBatchRequest
@ -122,7 +123,8 @@ class AnthropicBatchesConfig(BaseBatchesConfig):
            Complete URL for Anthropic batch retrieval: {api_base}/v1/messages/batches/{batch_id}
        """
        api_base = api_base or self.anthropic_model_info.get_api_base(api_base)
-        return f"{api_base.rstrip('/')}/v1/messages/batches/{batch_id}"
+        encoded_batch_id = encode_url_path_segment(batch_id, field_name="batch_id")
+        return f"{api_base.rstrip('/')}/v1/messages/batches/{encoded_batch_id}"

    def transform_retrieve_batch_request(
        self,
--- a/litellm/llms/anthropic/chat/transformation.py
+++ b/litellm/llms/anthropic/chat/transformation.py
@ -1,18 +1,31 @@
 import json
 import re
 import time
-from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union, cast
+from typing import (
+    TYPE_CHECKING,
+    Any,
+    Dict,
+    List,
+    NoReturn,
+    Optional,
+    Tuple,
+    Union,
+    cast,
+)

 import httpx

 import litellm
 from litellm.constants import (
+    ANTHROPIC_MIN_THINKING_BUDGET_TOKENS,
    ANTHROPIC_WEB_SEARCH_TOOL_MAX_USES,
    DEFAULT_ANTHROPIC_CHAT_MAX_TOKENS,
    DEFAULT_REASONING_EFFORT_HIGH_THINKING_BUDGET,
    DEFAULT_REASONING_EFFORT_LOW_THINKING_BUDGET,
+    DEFAULT_REASONING_EFFORT_MAX_THINKING_BUDGET,
    DEFAULT_REASONING_EFFORT_MEDIUM_THINKING_BUDGET,
    DEFAULT_REASONING_EFFORT_MINIMAL_THINKING_BUDGET,
+    DEFAULT_REASONING_EFFORT_XHIGH_THINKING_BUDGET,
    RESPONSE_FORMAT_TOOL_NAME,
 )
 from litellm.litellm_core_utils.core_helpers import map_finish_reason
@ -92,6 +105,22 @@ else:
    LoggingClass = Any


+REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT: Dict[str, str] = {
+    "low": "low",
+    "minimal": "low",
+    "medium": "medium",
+    "high": "high",
+    "xhigh": "xhigh",
+    "max": "max",
+}
+
+DROP_UNSUPPORTED_OUTPUT_CONFIG_WARNING = (
+    "Dropping unsupported `output_config` for model=%s "
+    "(drop_params=True). Effort is only supported on Opus 4.5+, "
+    "Sonnet 4.6+, and Mythos Preview."
+)
+
+
 class AnthropicConfig(AnthropicModelInfo, BaseConfig):
    """
    Reference: https://docs.anthropic.com/claude/reference/messages_post
@ -202,17 +231,96 @@ class AnthropicConfig(AnthropicModelInfo, BaseConfig):
    def _supports_effort_level(model: str, level: str) -> bool:
        """Check ``supports_{level}_reasoning_effort`` in the model map.

-        Mirrors the pattern used in ``openai/chat/gpt_5_transformation.py`` so
-        that adding support for a new effort level is a pure model-map change.
+        Strips bedrock/vertex prefixes so a provider-routed Claude still
+        resolves to the Anthropic model-map entry.
        """
+        key = f"supports_{level}_reasoning_effort"
        try:
-            return _supports_factory(
+            if _supports_factory(
                model=model,
                custom_llm_provider="anthropic",
-                key=f"supports_{level}_reasoning_effort",
-            )
+                key=key,
+            ):
+                return True
        except Exception:
-            return False
+            pass
+        candidates = [model]
+        for prefix in (
+            "bedrock/converse/",
+            "bedrock/invoke/",
+            "bedrock/",
+            "vertex_ai/",
+        ):
+            if model.startswith(prefix):
+                candidates.append(model[len(prefix) :])
+        try:
+            from litellm.llms.bedrock.common_utils import BedrockModelInfo
+
+            base = BedrockModelInfo.get_base_model(model)
+            if base:
+                candidates.append(base)
+                candidates.append(f"bedrock/{base}")
+        except Exception:
+            pass
+        try:
+            import litellm
+
+            for cand in candidates:
+                if cand in litellm.model_cost and (
+                    litellm.model_cost[cand].get(key) is True
+                ):
+                    return True
+        except Exception:
+            pass
+        return False
+
+    @staticmethod
+    def _validate_effort_for_model(model: str, effort: Optional[str]) -> Optional[str]:
+        """Return ``None`` if ``effort`` is allowed on ``model``, else an error message."""
+        if effort == "max" and not (
+            AnthropicConfig._is_claude_4_6_model(model)
+            or AnthropicConfig._is_claude_4_7_model(model)
+            or AnthropicConfig._supports_effort_level(model, "max")
+        ):
+            return f"effort='max' is not supported by this model. Got model: {model}"
+        if effort == "xhigh" and not AnthropicConfig._supports_effort_level(
+            model, "xhigh"
+        ):
+            return f"effort='xhigh' is not supported by this model. Got model: {model}"
+        return None
+
+    @staticmethod
+    def _model_supports_effort_param(model: str) -> bool:
+        """Whether the model accepts ``output_config.effort`` at all."""
+        return any(
+            AnthropicConfig._supports_effort_level(model, level)
+            for level in ("low", "minimal", "medium", "high", "xhigh", "max")
+        )
+
+    @staticmethod
+    def _raise_invalid_reasoning_effort(
+        model: str, value: Any, llm_provider: str
+    ) -> NoReturn:
+        """Raise a ``BadRequestError`` for an unrecognised ``reasoning_effort``.
+
+        Args:
+            model: The model id the request was routed to (surfaced in the error).
+            value: The offending ``reasoning_effort`` value supplied by the caller.
+            llm_provider: Provider tag for the raised exception (``"anthropic"``,
+                ``"bedrock_converse"``, ``"databricks"``, ...).
+
+        Raises:
+            litellm.exceptions.BadRequestError: Always.
+        """
+        raise litellm.exceptions.BadRequestError(
+            message=(
+                f"Invalid reasoning_effort: {value!r}. "
+                f"Must be one of: 'minimal', 'low', 'medium', "
+                f"'high', 'xhigh', 'max', 'none'"
+            ),
+            model=model,
+            llm_provider=llm_provider,
+        )

    def get_supported_openai_params(self, model: str):
        params = [
@ -794,12 +902,11 @@ class AnthropicConfig(AnthropicModelInfo, BaseConfig):
    def _map_reasoning_effort(
        reasoning_effort: Optional[Union[REASONING_EFFORT, str]],
        model: str,
+        llm_provider: str = "anthropic",
    ) -> Optional[AnthropicThinkingParam]:
        if reasoning_effort is None or reasoning_effort == "none":
            return None
-        if AnthropicConfig._is_claude_4_6_model(
-            model
-        ) or AnthropicConfig._is_claude_4_7_model(model):
+        if AnthropicConfig._is_adaptive_thinking_model(model):
            return AnthropicThinkingParam(
                type="adaptive",
            )
@ -818,13 +925,34 @@ class AnthropicConfig(AnthropicModelInfo, BaseConfig):
                type="enabled",
                budget_tokens=DEFAULT_REASONING_EFFORT_HIGH_THINKING_BUDGET,
            )
+        elif reasoning_effort == "xhigh":
+            return AnthropicThinkingParam(
+                type="enabled",
+                budget_tokens=DEFAULT_REASONING_EFFORT_XHIGH_THINKING_BUDGET,
+            )
+        elif reasoning_effort == "max":
+            return AnthropicThinkingParam(
+                type="enabled",
+                budget_tokens=DEFAULT_REASONING_EFFORT_MAX_THINKING_BUDGET,
+            )
        elif reasoning_effort == "minimal":
            return AnthropicThinkingParam(
                type="enabled",
-                budget_tokens=DEFAULT_REASONING_EFFORT_MINIMAL_THINKING_BUDGET,
+                budget_tokens=max(
+                    DEFAULT_REASONING_EFFORT_MINIMAL_THINKING_BUDGET,
+                    ANTHROPIC_MIN_THINKING_BUDGET_TOKENS,
+                ),
            )
        else:
-            raise ValueError(f"Unmapped reasoning effort: {reasoning_effort}")
+            raise litellm.exceptions.BadRequestError(
+                message=(
+                    f"Unmapped reasoning effort: {reasoning_effort!r}. "
+                    f"Must be one of: 'minimal', 'low', 'medium', 'high', "
+                    f"'xhigh', 'max', 'none'."
+                ),
+                model=model,
+                llm_provider=llm_provider,
+            )

    def _extract_json_schema_from_response_format(
        self, value: Optional[dict]
@ -1088,24 +1216,27 @@ class AnthropicConfig(AnthropicModelInfo, BaseConfig):
            elif param == "thinking":
                optional_params["thinking"] = value
            elif param == "reasoning_effort" and isinstance(value, str):
-                optional_params["thinking"] = AnthropicConfig._map_reasoning_effort(
-                    reasoning_effort=value, model=model
+                mapped_thinking = AnthropicConfig._map_reasoning_effort(
+                    reasoning_effort=value,
+                    model=model,
+                    llm_provider=self.custom_llm_provider or "anthropic",
                )
-                # For Claude 4.6+ models, effort is controlled via output_config,
-                # not thinking budget_tokens. Map reasoning_effort to output_config.
-                if AnthropicConfig._is_claude_4_6_model(
-                    model
-                ) or AnthropicConfig._is_claude_4_7_model(model):
-                    effort_map = {
-                        "low": "low",
-                        "minimal": "low",
-                        "medium": "medium",
-                        "high": "high",
-                        "xhigh": "xhigh",
-                        "max": "max",
-                    }
-                    mapped_effort = effort_map.get(value, value)
-                    optional_params["output_config"] = {"effort": mapped_effort}
+                if mapped_thinking is None:
+                    optional_params.pop("thinking", None)
+                    optional_params.pop("output_config", None)
+                else:
+                    optional_params["thinking"] = mapped_thinking
+                    if AnthropicConfig._is_adaptive_thinking_model(model):
+                        mapped_effort = REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get(
+                            value
+                        )
+                        if mapped_effort is None:
+                            AnthropicConfig._raise_invalid_reasoning_effort(
+                                model=model,
+                                value=value,
+                                llm_provider=self.custom_llm_provider or "anthropic",
+                            )
+                        optional_params["output_config"] = {"effort": mapped_effort}
            elif param == "web_search_options" and isinstance(value, dict):
                hosted_web_search_tool = self.map_web_search_tool(
                    cast(OpenAIWebSearchOptions, value)
@ -1527,29 +1658,31 @@ class AnthropicConfig(AnthropicModelInfo, BaseConfig):
        output_config = optional_params.get("output_config")
        if not output_config or not isinstance(output_config, dict):
            return
+        if litellm.drop_params is True and not self._model_supports_effort_param(model):
+            litellm.verbose_logger.warning(
+                DROP_UNSUPPORTED_OUTPUT_CONFIG_WARNING,
+                model,
+            )
+            optional_params.pop("output_config", None)
+            data.pop("output_config", None)
+            return
        effort = output_config.get("effort")
        valid_efforts = ["high", "medium", "low", "xhigh", "max"]
-        if effort and effort not in valid_efforts:
-            raise ValueError(
-                f"Invalid effort value: {effort}. Must be one of: "
-                f"'high', 'medium', 'low', 'xhigh', 'max'"
+        if effort is not None and effort not in valid_efforts:
+            raise litellm.exceptions.BadRequestError(
+                message=(
+                    f"Invalid effort value: {effort!r}. Must be one of: "
+                    f"'high', 'medium', 'low', 'xhigh', 'max'"
+                ),
+                model=model,
+                llm_provider=self.custom_llm_provider or "anthropic",
            )
-        # ``max`` is for Opus 4.6+ output effort (not Sonnet 4.6, not Opus 4.5).
-        # Accept known Opus 4.6/4.7 id patterns and/or ``supports_max_reasoning_effort``
-        # in the model map (same pattern as ``xhigh`` below).
-        if effort == "max" and not (
-            self._is_opus_4_6_model(model)
-            or self._is_opus_4_7_model(model)
-            or self._supports_effort_level(model, "max")
-        ):
-            raise ValueError(
-                f"effort='max' is not supported by this model. Got model: {model}"
-            )
-        # ``xhigh`` is data-driven via ``supports_xhigh_reasoning_effort`` so
-        # enabling it for a new model is a pure model-map change.
-        if effort == "xhigh" and not self._supports_effort_level(model, "xhigh"):
-            raise ValueError(
-                f"effort='xhigh' is not supported by this model. Got model: {model}"
+        gate_error = self._validate_effort_for_model(model, effort)
+        if gate_error is not None:
+            raise litellm.exceptions.BadRequestError(
+                message=gate_error,
+                model=model,
+                llm_provider=self.custom_llm_provider or "anthropic",
            )
        data["output_config"] = output_config

--- a/litellm/llms/anthropic/common_utils.py
+++ b/litellm/llms/anthropic/common_utils.py
@ -273,7 +273,18 @@ class AnthropicModelInfo(BaseLLMModelInfo):

    @staticmethod
    def _is_adaptive_thinking_model(model: str) -> bool:
-        """Claude 4.6+ models use adaptive thinking with output_config effort."""
+        """Claude 4.6+ models use adaptive thinking with ``output_config.effort``."""
+        from litellm.utils import _supports_factory
+
+        try:
+            if _supports_factory(
+                model=model,
+                custom_llm_provider=None,
+                key="supports_adaptive_thinking",
+            ):
+                return True
+        except Exception:
+            pass
        return AnthropicModelInfo._is_claude_4_6_model(
            model
        ) or AnthropicModelInfo._is_claude_4_7_model(model)
--- a/litellm/llms/anthropic/experimental_pass_through/adapters/handler.py
+++ b/litellm/llms/anthropic/experimental_pass_through/adapters/handler.py
@ -27,6 +27,16 @@ from litellm.utils import get_model_info
 if TYPE_CHECKING:
    pass

+
+# Anthropic-only fields that the translator above already maps into the
+# OpenAI-format completion_kwargs (output_config → reasoning_effort /
+# response_format, etc.). They must be filtered out of the raw
+# extra_kwargs re-merge below or non-Anthropic backends reject the call
+# with 400 "Extra inputs are not permitted". Add new entries here when
+# extending AnthropicMessagesRequestOptionalParams with another Anthropic-
+# specific key.
+ANTHROPIC_ONLY_REQUEST_KEYS: frozenset[str] = frozenset({"output_config"})
+
 ########################################################
 # init adapter
 ANTHROPIC_ADAPTER = AnthropicAdapter()
@ -202,8 +212,12 @@ class LiteLLMMessagesToCompletionTransformationHandler:
            request_data["output_format"] = output_format

        # Extract output_config from extra_kwargs so the translator can use it
-        # (e.g. output_config.effort for adaptive thinking → reasoning_effort)
-        extra_kwargs = extra_kwargs or {}
+        # (e.g. output_config.effort for adaptive thinking → reasoning_effort,
+        # output_config.format → response_format for structured outputs).
+        # Use explicit None check rather than `or {}` so an explicit empty dict
+        # caller-passed argument is preserved (matters for tests that drive
+        # the fallback inference path).
+        extra_kwargs = extra_kwargs if extra_kwargs is not None else {}
        if "output_config" in extra_kwargs:
            request_data["output_config"] = extra_kwargs["output_config"]

@ -225,8 +239,23 @@ class LiteLLMMessagesToCompletionTransformationHandler:
                "include_usage": True,
            }

-        excluded_keys = {"anthropic_messages"}
-        extra_kwargs = extra_kwargs or {}
+        # Keys that must NOT be forwarded as raw extras into the OpenAI-format
+        # ``completion_kwargs`` after translation. The translator above has
+        # already consumed the meaningful parts of these inputs (e.g.
+        # ``output_config.format`` → ``response_format``, ``output_config.effort``
+        # → ``reasoning_effort`` for non-Claude targets). Re-adding the raw
+        # Anthropic-shaped key here causes 400 "Extra inputs are not permitted"
+        # on non-Anthropic backends (Azure OpenAI, Fireworks, Bedrock Nova,
+        # etc.) and is silently lossy on Anthropic-family targets, which would
+        # see the translated key ``response_format`` AND a duplicate, conflicting
+        # ``output_config``.
+        #
+        # Maintainability: when adding a new Anthropic-only request param to
+        # ``AnthropicMessagesRequestOptionalParams``, also extend
+        # ``ANTHROPIC_ONLY_REQUEST_KEYS`` here so it doesn't silently leak.
+        excluded_keys = ANTHROPIC_ONLY_REQUEST_KEYS | {"anthropic_messages"}
+        # NOTE: extra_kwargs was already coerced from None to {} at the top of
+        # this method (line ~220). It is guaranteed to be a dict here.
        for key, value in extra_kwargs.items():
            if (
                key == "litellm_logging_obj"
--- a/litellm/llms/anthropic/experimental_pass_through/adapters/transformation.py
+++ b/litellm/llms/anthropic/experimental_pass_through/adapters/transformation.py
@ -667,7 +667,7 @@ class LiteLLMAnthropicMessagesAdapter:

    @staticmethod
    def translate_anthropic_thinking_to_reasoning_effort(
-        thinking: Dict[str, Any]
+        thinking: Dict[str, Any],
    ) -> Optional[str]:
        """
        Translate Anthropic's thinking parameter to OpenAI's reasoning_effort.
@ -1084,10 +1084,23 @@ class LiteLLMAnthropicMessagesAdapter:
        anthropic_message_request: AnthropicMessagesRequest,
        new_kwargs: ChatCompletionRequest,
    ) -> None:
-        """Translate output_format to response_format when applicable."""
-        if "output_format" not in anthropic_message_request:
-            return
-        output_format = anthropic_message_request["output_format"]
+        """Translate Anthropic structured-output config to OpenAI ``response_format``.
+
+        Accepts either the legacy top-level ``output_format`` field OR the
+        newer ``output_config.format`` (sub-key on ``output_config``) so that
+        both shapes flow through to non-Anthropic backends as
+        ``response_format``. Without the ``output_config.format`` branch,
+        callers using the new Anthropic Structured Outputs API would have
+        their schema silently dropped on the adapter path — only the legacy
+        top-level ``output_format`` was being mapped.
+
+        ``output_format`` takes precedence when both are provided.
+        """
+        output_format: Any = anthropic_message_request.get("output_format")
+        if not output_format:
+            output_config = anthropic_message_request.get("output_config")
+            if isinstance(output_config, dict):
+                output_format = output_config.get("format")
        if not output_format:
            return
        response_format = self.translate_anthropic_output_format_to_openai(
--- a/litellm/llms/anthropic/experimental_pass_through/messages/transformation.py
+++ b/litellm/llms/anthropic/experimental_pass_through/messages/transformation.py
@ -47,6 +47,7 @@ class AnthropicMessagesConfig(BaseAnthropicMessagesConfig):
            "inference_geo",
            "speed",
            "output_config",
+            "reasoning_effort",
            # TODO: Add Anthropic `metadata` support
            # "metadata",
        ]
@ -166,6 +167,62 @@ class AnthropicMessagesConfig(BaseAnthropicMessagesConfig):

        return headers, api_base

+    @staticmethod
+    def _translate_reasoning_effort_to_anthropic(
+        model: str, optional_params: Dict
+    ) -> None:
+        """Map OpenAI-style ``reasoning_effort`` to native Anthropic params.
+
+        Caller-supplied ``thinking`` / ``output_config`` win over the alias.
+        ``effort='none'`` clears both. Invalid efforts raise a 400.
+        """
+        from litellm.exceptions import BadRequestError as _BadRequestError
+        from litellm.llms.anthropic.chat.transformation import (
+            REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT,
+            AnthropicConfig,
+        )
+
+        reasoning_effort = optional_params.pop("reasoning_effort", None)
+        if not isinstance(reasoning_effort, str):
+            return
+
+        try:
+            mapped_thinking = AnthropicConfig._map_reasoning_effort(
+                reasoning_effort=reasoning_effort, model=model
+            )
+        except _BadRequestError as e:
+            raise AnthropicError(message=str(e.message), status_code=400)
+
+        if mapped_thinking is None:
+            optional_params.pop("thinking", None)
+            optional_params.pop("output_config", None)
+            return
+
+        optional_params.setdefault("thinking", mapped_thinking)
+        if AnthropicModelInfo._is_adaptive_thinking_model(model):
+            mapped_effort = REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get(
+                reasoning_effort
+            )
+            if mapped_effort is None:
+                raise AnthropicError(
+                    message=(
+                        f"Invalid reasoning_effort: {reasoning_effort!r}. "
+                        f"Must be one of: 'minimal', 'low', 'medium', 'high', "
+                        f"'xhigh', 'max', 'none'"
+                    ),
+                    status_code=400,
+                )
+            gate_error = AnthropicConfig._validate_effort_for_model(
+                model, mapped_effort
+            )
+            if gate_error is not None:
+                raise AnthropicError(message=gate_error, status_code=400)
+            existing_output_config = optional_params.get("output_config")
+            if not isinstance(existing_output_config, dict):
+                existing_output_config = {}
+            existing_output_config.setdefault("effort", mapped_effort)
+            optional_params["output_config"] = existing_output_config
+
    @staticmethod
    def _translate_legacy_thinking_for_adaptive_model(
        model: str, optional_params: Dict
@ -217,6 +274,11 @@ class AnthropicMessagesConfig(BaseAnthropicMessagesConfig):
                status_code=400,
            )

+        self._translate_reasoning_effort_to_anthropic(
+            model=model,
+            optional_params=anthropic_messages_optional_request_params,
+        )
+
        self._translate_legacy_thinking_for_adaptive_model(
            model=model,
            optional_params=anthropic_messages_optional_request_params,
--- a/litellm/llms/anthropic/files/handler.py
+++ b/litellm/llms/anthropic/files/handler.py
@ -9,6 +9,7 @@ import litellm
 from litellm._logging import verbose_logger
 from litellm._uuid import uuid
 from litellm.litellm_core_utils.litellm_logging import Logging
+from litellm.litellm_core_utils.url_utils import encode_url_path_segment
 from litellm.llms.custom_httpx.http_handler import get_async_httpx_client
 from litellm.types.llms.openai import (
    FileContentRequest,
@ -89,7 +90,10 @@ class AnthropicFilesHandler:
            raise ValueError("Missing Anthropic API Key")

        # Construct the Anthropic batch results URL
-        results_url = f"{api_base.rstrip('/')}/v1/messages/batches/{batch_id}/results"
+        encoded_batch_id = encode_url_path_segment(batch_id, field_name="batch_id")
+        results_url = (
+            f"{api_base.rstrip('/')}/v1/messages/batches/{encoded_batch_id}/results"
+        )

        # Prepare headers
        headers = {
--- a/litellm/llms/anthropic/files/transformation.py
+++ b/litellm/llms/anthropic/files/transformation.py
@ -19,6 +19,7 @@ from typing import Any, Dict, List, Optional, Union, cast
 import httpx
 from openai.types.file_deleted import FileDeleted

+from litellm.litellm_core_utils.url_utils import encode_url_path_segment
 from litellm.litellm_core_utils.prompt_templates.common_utils import extract_file_data
 from litellm.llms.base_llm.chat.transformation import BaseLLMException
 from litellm.llms.base_llm.files.transformation import (
@ -185,7 +186,8 @@ class AnthropicFilesConfig(BaseFilesConfig):
            AnthropicModelInfo.get_api_base(litellm_params.get("api_base"))
            or ANTHROPIC_FILES_API_BASE
        )
-        return f"{api_base.rstrip('/')}/v1/files/{file_id}", {}
+        encoded_file_id = encode_url_path_segment(file_id, field_name="file_id")
+        return f"{api_base.rstrip('/')}/v1/files/{encoded_file_id}", {}

    def transform_retrieve_file_response(
        self,
@ -206,7 +208,8 @@ class AnthropicFilesConfig(BaseFilesConfig):
            AnthropicModelInfo.get_api_base(litellm_params.get("api_base"))
            or ANTHROPIC_FILES_API_BASE
        )
-        return f"{api_base.rstrip('/')}/v1/files/{file_id}", {}
+        encoded_file_id = encode_url_path_segment(file_id, field_name="file_id")
+        return f"{api_base.rstrip('/')}/v1/files/{encoded_file_id}", {}

    def transform_delete_file_response(
        self,
@ -268,7 +271,8 @@ class AnthropicFilesConfig(BaseFilesConfig):
            AnthropicModelInfo.get_api_base(litellm_params.get("api_base"))
            or ANTHROPIC_FILES_API_BASE
        )
-        return f"{api_base.rstrip('/')}/v1/files/{file_id}/content", {}
+        encoded_file_id = encode_url_path_segment(file_id, field_name="file_id")
+        return f"{api_base.rstrip('/')}/v1/files/{encoded_file_id}/content", {}

    def transform_file_content_response(
        self,
--- a/litellm/llms/anthropic/skills/transformation.py
+++ b/litellm/llms/anthropic/skills/transformation.py
@ -7,6 +7,7 @@ from typing import Any, Dict, Optional, Tuple
 import httpx

 from litellm._logging import verbose_logger
+from litellm.litellm_core_utils.url_utils import encode_url_path_segment
 from litellm.llms.base_llm.skills.transformation import (
    BaseSkillsAPIConfig,
    LiteLLMLoggingObj,
@ -81,7 +82,8 @@ class AnthropicSkillsConfig(BaseSkillsAPIConfig):
            api_base = AnthropicModelInfo.get_api_base()

        if skill_id:
-            return f"{api_base}/v1/skills/{skill_id}"
+            encoded_skill_id = encode_url_path_segment(skill_id, field_name="skill_id")
+            return f"{api_base}/v1/skills/{encoded_skill_id}"
        return f"{api_base}/v1/{endpoint}"

    def transform_create_skill_request(
--- a/litellm/llms/azure/azure.py
+++ b/litellm/llms/azure/azure.py
@ -16,6 +16,7 @@ import litellm
 from litellm.constants import AZURE_OPERATION_POLLING_TIMEOUT, DEFAULT_MAX_RETRIES
 from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj
 from litellm.litellm_core_utils.logging_utils import track_llm_api_timing
+from litellm.litellm_core_utils.url_utils import SSRFError, assert_same_origin
 from litellm.llms.custom_httpx.http_handler import (
    AsyncHTTPHandler,
    HTTPHandler,
@ -43,6 +44,7 @@ from .common_utils import (
    select_azure_base_url_or_endpoint,
 )
 from .image_generation import get_azure_image_generation_config
+from .image_generation.http_utils import azure_deployment_image_generation_json_body


 class AzureOpenAIAssistantsAPIConfig:
@ -792,6 +794,7 @@ class AzureChatCompletion(BaseAzureLLM, BaseLLM):
                    client=client,
                    litellm_params=litellm_params,
                    api_base=api_base,
+                    api_version=api_version,
                )
            azure_client = self.get_azure_openai_client(
                api_version=api_version,
@ -898,6 +901,17 @@ class AzureChatCompletion(BaseAzureLLM, BaseLLM):
                operation_location_url = response.headers["operation-location"]
            else:
                raise AzureOpenAIError(status_code=500, message=response.text)
+            # Reject polling URLs that don't share an origin with ``api_base``.
+            # Without this an upstream-controlled or attacker-controlled
+            # value would receive the operator's Azure API key in the
+            # request headers below. VERIA-51.
+            try:
+                assert_same_origin(operation_location_url, api_base)
+            except SSRFError as ssrf_err:
+                raise AzureOpenAIError(
+                    status_code=502,
+                    message=f"Rejected polling URL: {ssrf_err}",
+                )
            response = await async_handler.get(
                url=operation_location_url,
                headers=headers,
@ -908,8 +922,13 @@ class AzureChatCompletion(BaseAzureLLM, BaseLLM):
            timeout_secs: int = AZURE_OPERATION_POLLING_TIMEOUT
            start_time = time.time()
            if "status" not in response.json():
-                raise Exception(
-                    "Expected 'status' in response. Got={}".format(response.json())
+                # Don't reflect the raw response body — when the polling
+                # URL points at an internal JSON API (cloud metadata
+                # service etc.) reflecting it here turns Blind SSRF into
+                # Full-Read SSRF. VERIA-51.
+                raise AzureOpenAIError(
+                    status_code=502,
+                    message="Polling response missing 'status' field",
                )
            while response.json()["status"] not in ["succeeded", "failed"]:
                if time.time() - start_time > timeout_secs:
@ -948,9 +967,10 @@ class AzureChatCompletion(BaseAzureLLM, BaseLLM):
                content=json.dumps(result).encode("utf-8"),
                request=httpx.Request(method="POST", url="https://api.openai.com/v1"),
            )
+        request_json = azure_deployment_image_generation_json_body(api_base, data)
        return await async_handler.post(
            url=api_base,
-            json=data,
+            json=request_json,
            headers=headers,
        )

@ -1009,6 +1029,13 @@ class AzureChatCompletion(BaseAzureLLM, BaseLLM):
                operation_location_url = response.headers["operation-location"]
            else:
                raise AzureOpenAIError(status_code=500, message=response.text)
+            try:
+                assert_same_origin(operation_location_url, api_base)
+            except SSRFError as ssrf_err:
+                raise AzureOpenAIError(
+                    status_code=502,
+                    message=f"Rejected polling URL: {ssrf_err}",
+                )
            response = sync_handler.get(
                url=operation_location_url,
                headers=headers,
@ -1019,8 +1046,9 @@ class AzureChatCompletion(BaseAzureLLM, BaseLLM):
            timeout_secs: int = AZURE_OPERATION_POLLING_TIMEOUT
            start_time = time.time()
            if "status" not in response.json():
-                raise Exception(
-                    "Expected 'status' in response. Got={}".format(response.json())
+                raise AzureOpenAIError(
+                    status_code=502,
+                    message="Polling response missing 'status' field",
                )
            while response.json()["status"] not in ["succeeded", "failed"]:
                if time.time() - start_time > timeout_secs:
@ -1059,9 +1087,10 @@ class AzureChatCompletion(BaseAzureLLM, BaseLLM):
                content=json.dumps(result).encode("utf-8"),
                request=httpx.Request(method="POST", url="https://api.openai.com/v1"),
            )
+        request_json = azure_deployment_image_generation_json_body(api_base, data)
        return sync_handler.post(
            url=api_base,
-            json=data,
+            json=request_json,
            headers=headers,
        )

--- a/litellm/llms/azure/cost_calculation.py
+++ b/litellm/llms/azure/cost_calculation.py
@ -12,7 +12,10 @@ from litellm.utils import get_model_info


 def cost_per_token(
-    model: str, usage: Usage, response_time_ms: Optional[float] = 0.0
+    model: str,
+    usage: Usage,
+    response_time_ms: Optional[float] = 0.0,
+    service_tier: Optional[str] = None,
 ) -> Tuple[float, float]:
    """
    Calculates the cost per token for a given model, prompt tokens, and completion tokens.
@ -47,4 +50,5 @@ def cost_per_token(
        model=model,
        usage=usage,
        custom_llm_provider="azure",
+        service_tier=service_tier,
    )
--- a/litellm/llms/azure/image_edit/transformation.py
+++ b/litellm/llms/azure/image_edit/transformation.py
@ -9,6 +9,19 @@ from litellm.utils import _add_path_to_api_base


 class AzureImageEditConfig(OpenAIImageEditConfig):
+    @staticmethod
+    def azure_deployment_image_edit_form_data(data: dict, request_url: str) -> dict:
+        """
+        Azure OpenAI ``.../openai/deployments/{deployment}/images/edits`` routes by
+        deployment in the URL; including ``model`` in multipart fields can break
+        the same way as image generations (LiteLLM #26316).
+
+        Non-deployment edit URLs keep ``model`` when present.
+        """
+        if "images/edits" in request_url and "/openai/deployments/" in request_url:
+            return {k: v for k, v in data.items() if k != "model"}
+        return data
+
    def validate_environment(
        self,
        headers: dict,
@ -83,3 +96,8 @@ class AzureImageEditConfig(OpenAIImageEditConfig):
        final_url = httpx.URL(new_url).copy_with(params=query_params)

        return str(final_url)
+
+    def finalize_image_edit_request_data(
+        self, data: dict, resolved_request_url: str
+    ) -> dict:
+        return self.azure_deployment_image_edit_form_data(data, resolved_request_url)
--- a/litellm/llms/azure/image_generation/init.py
+++ b/litellm/llms/azure/image_generation/init.py
@ -6,11 +6,13 @@ from litellm.llms.base_llm.image_generation.transformation import (
 from .dall_e_2_transformation import AzureDallE2ImageGenerationConfig
 from .dall_e_3_transformation import AzureDallE3ImageGenerationConfig
 from .gpt_transformation import AzureGPTImageGenerationConfig
+from .http_utils import azure_deployment_image_generation_json_body

 __all__ = [
    "AzureDallE2ImageGenerationConfig",
    "AzureDallE3ImageGenerationConfig",
    "AzureGPTImageGenerationConfig",
+    "azure_deployment_image_generation_json_body",
 ]


--- a/litellm/llms/azure/image_generation/http_utils.py
+++ b/litellm/llms/azure/image_generation/http_utils.py
@ -0,0 +1,17 @@
+"""HTTP helpers for Azure OpenAI image generation (REST, not SDK)."""
+
+
+def azure_deployment_image_generation_json_body(api_base: str, data: dict) -> dict:
+    """
+    Build the JSON body for Azure OpenAI image generation POSTs.
+
+    For ``.../openai/deployments/{deployment}/images/generations``, routing uses the
+    deployment in the URL only; sending ``model`` in the body (especially the deployment
+    name) breaks some models (e.g. gpt-image-2). See LiteLLM #26316.
+
+    Provider-style URLs (e.g. ``/providers/...`` for FLUX on Azure AI) keep all keys
+    so non–OpenAI-deployment payloads still work.
+    """
+    if "images/generations" in api_base and "/openai/deployments/" in api_base:
+        return {k: v for k, v in data.items() if k != "model"}
+    return data
--- a/litellm/llms/azure/responses/transformation.py
+++ b/litellm/llms/azure/responses/transformation.py
@ -5,6 +5,7 @@ import httpx
 from openai.types.responses import ResponseReasoningItem

 from litellm._logging import verbose_logger
+from litellm.litellm_core_utils.url_utils import encode_url_path_segment
 from litellm.llms.azure.common_utils import BaseAzureLLM
 from litellm.llms.openai.responses.transformation import OpenAIResponsesAPIConfig
 from litellm.types.llms.openai import *
@ -201,7 +202,10 @@ class AzureOpenAIResponsesAPIConfig(OpenAIResponsesAPIConfig):
        # Insert the response_id at the end of the path component
        # Remove trailing slash if present to avoid double slashes
        path = parsed_url.path.rstrip("/")
-        new_path = f"{path}/{response_id}"
+        encoded_response_id = encode_url_path_segment(
+            response_id, field_name="response_id"
+        )
+        new_path = f"{path}/{encoded_response_id}"

        # Reconstruct the URL with all original components but with the modified path
        constructed_url = urlunparse(
@ -322,7 +326,10 @@ class AzureOpenAIResponsesAPIConfig(OpenAIResponsesAPIConfig):
        # Insert the response_id and /cancel at the end of the path component
        # Remove trailing slash if present to avoid double slashes
        path = parsed_url.path.rstrip("/")
-        new_path = f"{path}/{response_id}/cancel"
+        encoded_response_id = encode_url_path_segment(
+            response_id, field_name="response_id"
+        )
+        new_path = f"{path}/{encoded_response_id}/cancel"

        # Reconstruct the URL with all original components but with the modified path
        cancel_url = urlunparse(
--- a/litellm/llms/azure_ai/agents/handler.py
+++ b/litellm/llms/azure_ai/agents/handler.py
@ -36,6 +36,7 @@ from typing import (
 import httpx

 from litellm._logging import verbose_logger
+from litellm.litellm_core_utils.url_utils import encode_url_path_segment
 from litellm.llms.azure_ai.agents.transformation import (
    AzureAIAgentsConfig,
    AzureAIAgentsError,
@ -75,20 +76,29 @@ class AzureAIAgentsHandler:
    def _build_messages_url(
        self, api_base: str, thread_id: str, api_version: str
    ) -> str:
-        return f"{api_base}/threads/{thread_id}/messages?api-version={api_version}"
+        encoded_thread_id = encode_url_path_segment(thread_id, field_name="thread_id")
+        return (
+            f"{api_base}/threads/{encoded_thread_id}/messages?api-version={api_version}"
+        )

    def _build_runs_url(self, api_base: str, thread_id: str, api_version: str) -> str:
-        return f"{api_base}/threads/{thread_id}/runs?api-version={api_version}"
+        encoded_thread_id = encode_url_path_segment(thread_id, field_name="thread_id")
+        return f"{api_base}/threads/{encoded_thread_id}/runs?api-version={api_version}"

    def _build_run_status_url(
        self, api_base: str, thread_id: str, run_id: str, api_version: str
    ) -> str:
-        return f"{api_base}/threads/{thread_id}/runs/{run_id}?api-version={api_version}"
+        encoded_thread_id = encode_url_path_segment(thread_id, field_name="thread_id")
+        encoded_run_id = encode_url_path_segment(run_id, field_name="run_id")
+        return f"{api_base}/threads/{encoded_thread_id}/runs/{encoded_run_id}?api-version={api_version}"

    def _build_list_messages_url(
        self, api_base: str, thread_id: str, api_version: str
    ) -> str:
-        return f"{api_base}/threads/{thread_id}/messages?api-version={api_version}"
+        encoded_thread_id = encode_url_path_segment(thread_id, field_name="thread_id")
+        return (
+            f"{api_base}/threads/{encoded_thread_id}/messages?api-version={api_version}"
+        )

    def _build_create_thread_and_run_url(self, api_base: str, api_version: str) -> str:
        """URL for the create-thread-and-run endpoint (supports streaming)."""
--- a/litellm/llms/azure_ai/anthropic/transformation.py
+++ b/litellm/llms/azure_ai/anthropic/transformation.py
@ -12,6 +12,23 @@ if TYPE_CHECKING:
    pass


+def _promote_extra_body_to_optional_params(optional_params: dict) -> None:
+    """Promote anthropic-native passthrough keys out of ``extra_body``.
+
+    ``azure_ai`` is an OpenAI-compatible provider, so non-OpenAI kwargs like
+    ``output_config`` get auto-routed into ``extra_body`` by
+    ``add_provider_specific_params_to_optional_params``. For the Azure→Anthropic
+    route those keys must reach the request body and be validated, so promote
+    them. ``setdefault`` keeps explicit top-level values authoritative.
+    """
+    extra_body = optional_params.get("extra_body")
+    if not isinstance(extra_body, dict) or not extra_body:
+        return
+    for k, v in extra_body.items():
+        optional_params.setdefault(k, v)
+    optional_params.pop("extra_body", None)
+
+
 class AzureAnthropicConfig(AnthropicConfig):
    """
    Azure Anthropic configuration that extends AnthropicConfig.
@ -39,6 +56,8 @@ class AzureAnthropicConfig(AnthropicConfig):
        1. API key via 'api-key' header
        2. Azure AD token via 'Authorization: Bearer <token>' header
        """
+        _promote_extra_body_to_optional_params(optional_params)
+
        # Convert dict to GenericLiteLLMParams if needed
        if isinstance(litellm_params, dict):
            # Ensure api_key is included if provided
@ -101,7 +120,8 @@ class AzureAnthropicConfig(AnthropicConfig):
        Transform request using parent AnthropicConfig, then remove unsupported params.
        Azure Anthropic doesn't support extra_body, max_retries, or stream_options parameters.
        """
-        # Call parent transform_request
+        _promote_extra_body_to_optional_params(optional_params)
+
        data = super().transform_request(
            model=model,
            messages=messages,
--- a/litellm/llms/azure_ai/cost_calculator.py
+++ b/litellm/llms/azure_ai/cost_calculator.py
@ -65,6 +65,7 @@ def cost_per_token(
    usage: Usage,
    response_time_ms: Optional[float] = 0.0,
    request_model: Optional[str] = None,
+    service_tier: Optional[str] = None,
 ) -> Tuple[float, float]:
    """
    Calculate the cost per token for Azure AI models.
@ -102,6 +103,7 @@ def cost_per_token(
            model=model,
            usage=usage,
            custom_llm_provider="azure_ai",
+            service_tier=service_tier,
        )
    except Exception as e:
        # For Model Router, the model name (e.g., "azure-model-router") may not be in the cost map
--- a/litellm/llms/azure_ai/ocr/document_intelligence/transformation.py
+++ b/litellm/llms/azure_ai/ocr/document_intelligence/transformation.py
@ -17,11 +17,13 @@ from urllib.parse import quote
 import httpx

 from litellm._logging import verbose_logger
+from litellm.litellm_core_utils.url_utils import SSRFError, assert_same_origin
 from litellm.constants import (
    AZURE_DOCUMENT_INTELLIGENCE_API_VERSION,
    AZURE_DOCUMENT_INTELLIGENCE_DEFAULT_DPI,
    AZURE_OPERATION_POLLING_TIMEOUT,
 )
+from litellm.litellm_core_utils.url_utils import encode_url_path_segment
 from litellm.llms.base_llm.ocr.transformation import (
    BaseOCRConfig,
    DocumentType,
@ -217,11 +219,12 @@ class AzureDocumentIntelligenceOCRConfig(BaseOCRConfig):
        if "/" in model:
            # Extract the last part after the last slash
            model_id = model.split("/")[-1]
+        encoded_model_id = encode_url_path_segment(model_id, field_name="model_id")

        # Azure Document Intelligence analyze endpoint
        # Note: API version 2024-11-30+ uses /documentintelligence/ (not /formrecognizer/)
        url = (
-            f"{api_base}/documentintelligence/documentModels/{model_id}:analyze"
+            f"{api_base}/documentintelligence/documentModels/{encoded_model_id}:analyze"
            f"?api-version={AZURE_DOCUMENT_INTELLIGENCE_API_VERSION}"
        )

@ -599,6 +602,16 @@ class AzureDocumentIntelligenceOCRConfig(BaseOCRConfig):
                        "Azure Document Intelligence returned 202 but no Operation-Location header found"
                    )

+                # Reject cross-origin polling URLs — the auth headers
+                # below would otherwise leak to whatever URL the upstream
+                # (or an attacker-controlled upstream) returns. VERIA-51.
+                try:
+                    assert_same_origin(operation_url, str(raw_response.request.url))
+                except SSRFError as ssrf_err:
+                    raise ValueError(
+                        f"Azure Document Intelligence: rejected polling URL ({ssrf_err})"
+                    )
+
                # Get headers for polling (need auth)
                poll_headers = {
                    "Ocp-Apim-Subscription-Key": raw_response.request.headers.get(
@ -711,6 +724,14 @@ class AzureDocumentIntelligenceOCRConfig(BaseOCRConfig):
                        "Azure Document Intelligence returned 202 but no Operation-Location header found"
                    )

+                # Reject cross-origin polling URLs (see sync path). VERIA-51.
+                try:
+                    assert_same_origin(operation_url, str(raw_response.request.url))
+                except SSRFError as ssrf_err:
+                    raise ValueError(
+                        f"Azure Document Intelligence: rejected polling URL ({ssrf_err})"
+                    )
+
                # Get headers for polling (need auth)
                poll_headers = {
                    "Ocp-Apim-Subscription-Key": raw_response.request.headers.get(
--- a/litellm/llms/base_llm/chat/transformation.py
+++ b/litellm/llms/base_llm/chat/transformation.py
@ -87,9 +87,7 @@ class BaseConfig(ABC):
        return {
            k: v
            for k, v in cls.__dict__.items()
-            if not k.startswith("__")
-            and not k.startswith("_abc")
-            and not k.startswith("_is_base_class")
+            if not k.startswith("_")
            and not isinstance(
                v,
                (
--- a/litellm/llms/base_llm/image_edit/transformation.py
+++ b/litellm/llms/base_llm/image_edit/transformation.py
@ -102,6 +102,18 @@ class BaseImageEditConfig(ABC):
    ) -> Tuple[Dict, RequestFiles]:
        pass

+    def finalize_image_edit_request_data(
+        self, data: dict, resolved_request_url: str
+    ) -> dict:
+        """
+        Last pass on the request dict after ``transform_image_edit_request``, using the
+        exact URL string used for the HTTP POST (same as ``get_complete_url`` output).
+
+        The handler sends this dict as ``data=`` for multipart providers or ``json=``
+        for JSON-only providers; default implementation returns ``data`` unchanged.
+        """
+        return data
+
    @abstractmethod
    def transform_image_edit_response(
        self,
--- a/litellm/llms/base_llm/managed_resources/base_managed_resource.py
+++ b/litellm/llms/base_llm/managed_resources/base_managed_resource.py
@ -18,6 +18,11 @@ from typing import (
 )

 from litellm import verbose_logger
+from litellm.llms.base_llm.managed_resources.isolation import (
+    build_list_page,
+    build_owner_filter,
+    can_access_resource,
+)
 from litellm.proxy._types import UserAPIKeyAuth
 from litellm.types.utils import SpecialEnums

@ -169,6 +174,7 @@ class BaseManagedResource(ABC, Generic[ResourceObjectType]):
            "model_mappings": model_mappings,
            "flat_model_resource_ids": list(model_mappings.values()),
            "created_by": user_api_key_dict.user_id,
+            "team_id": user_api_key_dict.team_id,
            "updated_by": user_api_key_dict.user_id,
        }

@ -190,6 +196,7 @@ class BaseManagedResource(ABC, Generic[ResourceObjectType]):
            "model_mappings": json.dumps(model_mappings),
            "flat_model_resource_ids": list(model_mappings.values()),
            "created_by": user_api_key_dict.user_id,
+            "team_id": user_api_key_dict.team_id,
            "updated_by": user_api_key_dict.user_id,
        }

@ -316,15 +323,17 @@ class BaseManagedResource(ABC, Generic[ResourceObjectType]):
        Returns:
            True if user has access, False otherwise
        """
-        user_id = user_api_key_dict.user_id
-
        # Use cached method instead of direct DB query
        resource = await self.get_unified_resource_id(
            unified_resource_id, litellm_parent_otel_span
        )

        if resource:
-            return resource.get("created_by") == user_id
+            return can_access_resource(
+                user_api_key_dict=user_api_key_dict,
+                created_by=resource.get("created_by"),
+                resource_team_id=resource.get("team_id"),
+            )

        return False

@ -549,11 +558,11 @@ class BaseManagedResource(ABC, Generic[ResourceObjectType]):
        Returns:
            Dictionary with list of resources and pagination info
        """
-        where_clause: Dict[str, Any] = {}
+        owner_filter = build_owner_filter(user_api_key_dict)
+        if owner_filter is None:
+            return build_list_page([])

-        # Filter by user who created the resource
-        if user_api_key_dict.user_id:
-            where_clause["created_by"] = user_api_key_dict.user_id
+        where_clause: Dict[str, Any] = {**owner_filter}

        if after:
            where_clause["id"] = {"gt": after}
@ -598,10 +607,6 @@ class BaseManagedResource(ABC, Generic[ResourceObjectType]):
                )
                continue

-        return {
-            "object": "list",
-            "data": resource_objects,
-            "first_id": resource_objects[0].id if resource_objects else None,
-            "last_id": resource_objects[-1].id if resource_objects else None,
-            "has_more": len(resource_objects) == (limit or 20),
-        }
+        return build_list_page(
+            resource_objects, has_more=len(resource_objects) == (limit or 20)
+        )
--- a/litellm/llms/base_llm/managed_resources/isolation.py
+++ b/litellm/llms/base_llm/managed_resources/isolation.py
@ -0,0 +1,99 @@
+"""
+Tenant-isolation helpers for managed file/batch/vector-store resources.
+
+Returns a Prisma filter and an ownership check that scope managed resources
+to the caller's identity: proxy admins see everything, user-keyed callers
+see records they created, and service-account keys (no user_id) fall back
+to the resource's owning team. Callers with no admin role and no
+identifying ids are denied so an empty user_id can never select an
+unscoped query.
+"""
+
+from typing import Any, Dict, List, Optional
+
+from litellm.proxy._types import (
+    UserAPIKeyAuth,
+    user_api_key_has_admin_view as _user_has_admin_view,
+)
+
+
+def build_list_page(items: List[Any], has_more: bool = False) -> Dict[str, Any]:
+    """Build the OpenAI-style paginated list response shape used by managed
+    file/batch/vector-store listings. ``first_id`` and ``last_id`` are
+    sourced from each item's ``.id`` attribute."""
+    return {
+        "object": "list",
+        "data": items,
+        "first_id": items[0].id if items else None,
+        "last_id": items[-1].id if items else None,
+        "has_more": has_more,
+    }
+
+
+def build_owner_filter(
+    user_api_key_dict: UserAPIKeyAuth,
+) -> Optional[Dict[str, Any]]:
+    """Return a Prisma `where` fragment that scopes a managed-resource listing
+    to records the caller is allowed to see.
+
+    - ``{}`` means no scoping (proxy admins).
+    - ``{"created_by": <user_id>}`` for user-keyed callers.
+    - ``{"team_id": <team_id>}`` for service-account callers
+      that have a team but no user_id.
+    - ``{"OR": [...]}`` when the caller has both — listing must include
+      both their own resources and team-shared ones so it stays consistent
+      with ``can_access_resource``.
+    - ``None`` means deny: callers MUST skip the query rather than fall
+      back to an unscoped fetch.
+    """
+    if _user_has_admin_view(user_api_key_dict):
+        return {}
+
+    user_id = user_api_key_dict.user_id
+    team_id = user_api_key_dict.team_id
+
+    if user_id is not None and team_id is not None:
+        return {
+            "OR": [
+                {"created_by": user_id},
+                {"team_id": team_id},
+            ]
+        }
+
+    if user_id is not None:
+        return {"created_by": user_id}
+
+    if team_id is not None:
+        return {"team_id": team_id}
+
+    return None
+
+
+def can_access_resource(
+    user_api_key_dict: UserAPIKeyAuth,
+    created_by: Optional[str],
+    resource_team_id: Optional[str],
+) -> bool:
+    """Return True iff the caller may read/modify a managed resource.
+
+    The resource's ``created_by`` and ``team_id`` fields must be non-None
+    to match the caller's identity — guarding against the ``None == None``
+    bypass that previously let service-account keys read every keyless
+    resource.
+    """
+    if _user_has_admin_view(user_api_key_dict):
+        return True
+
+    user_id = user_api_key_dict.user_id
+    if user_id is not None and created_by is not None and created_by == user_id:
+        return True
+
+    team_id = user_api_key_dict.team_id
+    if (
+        team_id is not None
+        and resource_team_id is not None
+        and resource_team_id == team_id
+    ):
+        return True
+
+    return False
--- a/litellm/llms/bedrock/chat/converse_transformation.py
+++ b/litellm/llms/bedrock/chat/converse_transformation.py
@ -31,7 +31,11 @@ from litellm.litellm_core_utils.prompt_templates.factory import (
    _bedrock_converse_messages_pt,
    _bedrock_tools_pt,
 )
-from litellm.llms.anthropic.chat.transformation import AnthropicConfig
+from litellm.llms.anthropic.chat.transformation import (
+    DROP_UNSUPPORTED_OUTPUT_CONFIG_WARNING,
+    REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT,
+    AnthropicConfig,
+)
 from litellm.llms.base_llm.chat.transformation import BaseConfig, BaseLLMException
 from litellm.types.llms.bedrock import *
 from litellm.types.llms.openai import (
@ -189,7 +193,7 @@ class AmazonConverseConfig(BaseConfig):
        return {
            k: v
            for k, v in cls.__dict__.items()
-            if not k.startswith("__")
+            if not k.startswith("_")
            and not isinstance(
                v,
                (
@ -410,47 +414,64 @@ class AmazonConverseConfig(BaseConfig):
        """
        Handle the reasoning_effort parameter based on the model type.

-        Different model families handle reasoning effort differently:
-        - GPT-OSS models: Keep reasoning_effort as-is (passed to additionalModelRequestFields)
-        - Nova 2 models: Transform to reasoningConfig structure
-        - Other models (Anthropic, etc.): Convert to thinking parameter
-
-        Args:
-            model: The model identifier
-            reasoning_effort: The reasoning effort value
-            optional_params: Dictionary of optional parameters to update in-place
-
-        Examples:
-            >>> config = AmazonConverseConfig()
-            >>> params = {}
-            >>> config._handle_reasoning_effort_parameter("gpt-oss-model", "high", params)
-            >>> params
-            {'reasoning_effort': 'high'}
-
-            >>> params = {}
-            >>> config._handle_reasoning_effort_parameter("amazon.nova-2-lite-v1:0", "high", params)
-            >>> params
-            {'reasoningConfig': {'type': 'enabled', 'maxReasoningEffort': 'high'}}
-
-            >>> params = {}
-            >>> config._handle_reasoning_effort_parameter("anthropic.claude-3", "high", params)
-            >>> params
-            {'thinking': {'type': 'enabled', 'budget_tokens': 10000}}
+        - GPT-OSS models: passed through unchanged via additionalModelRequestFields.
+        - Nova 2 models: transformed to reasoningConfig.
+        - Anthropic models: mapped to ``thinking`` (and ``output_config.effort`` on
+          adaptive Claude 4.6 / 4.7).
        """
        if "gpt-oss" in model:
-            # GPT-OSS models: keep reasoning_effort as-is
-            # It will be passed through to additionalModelRequestFields
            optional_params["reasoning_effort"] = reasoning_effort
        elif self._is_nova_2_model(model):
-            # Nova 2 models: transform to reasoningConfig
            reasoning_config = self._transform_reasoning_effort_to_reasoning_config(
                reasoning_effort
            )
            optional_params.update(reasoning_config)
        else:
-            # Anthropic and other models: convert to thinking parameter
-            optional_params["thinking"] = AnthropicConfig._map_reasoning_effort(
-                reasoning_effort=reasoning_effort, model=model
+            mapped_thinking = AnthropicConfig._map_reasoning_effort(
+                reasoning_effort=reasoning_effort,
+                model=model,
+                llm_provider="bedrock_converse",
+            )
+            if mapped_thinking is None:
+                optional_params.pop("thinking", None)
+                optional_params.pop("output_config", None)
+            else:
+                optional_params["thinking"] = mapped_thinking
+                if AnthropicConfig._is_adaptive_thinking_model(model):
+                    mapped_effort = REASONING_EFFORT_TO_OUTPUT_CONFIG_EFFORT.get(
+                        reasoning_effort
+                    )
+                    if mapped_effort is None:
+                        AnthropicConfig._raise_invalid_reasoning_effort(
+                            model=model,
+                            value=reasoning_effort,
+                            llm_provider="bedrock_converse",
+                        )
+                    self._validate_anthropic_adaptive_effort(
+                        model=model, effort=mapped_effort
+                    )
+                    optional_params["output_config"] = {"effort": mapped_effort}
+
+    @staticmethod
+    def _validate_anthropic_adaptive_effort(model: str, effort: str) -> None:
+        """Validate ``output_config.effort`` for adaptive-thinking Claude 4.6/4.7."""
+        valid_efforts = {"high", "medium", "low", "xhigh", "max"}
+        if effort not in valid_efforts:
+            raise litellm.exceptions.BadRequestError(
+                message=(
+                    f"Invalid reasoning_effort/output_config.effort value: "
+                    f"{effort!r}. Must be one of: 'low', 'medium', 'high', "
+                    f"'xhigh', or 'max'."
+                ),
+                model=model,
+                llm_provider="bedrock_converse",
+            )
+        error = AnthropicConfig._validate_effort_for_model(model=model, effort=effort)
+        if error is not None:
+            raise litellm.exceptions.BadRequestError(
+                message=error,
+                model=model,
+                llm_provider="bedrock_converse",
            )

    @staticmethod
@ -1192,9 +1213,11 @@ class AmazonConverseConfig(BaseConfig):
            + supported_config_params
        )
        inference_params.pop("json_mode", None)  # used for handling json_schema
-        # Anthropic-only key. Bedrock expects `outputConfig` (camelCase) and
-        # will reject `output_config` if it leaks through pass-through routes.
-        inference_params.pop("output_config", None)
+
+        # Anthropic-only ``output_config`` (snake_case) — re-attached to
+        # ``additionalModelRequestFields`` for Anthropic models below. The
+        # Bedrock-native ``outputConfig`` (camelCase) is handled separately.
+        anthropic_output_config = inference_params.pop("output_config", None)

        # Extract requestMetadata before processing other parameters
        request_metadata = inference_params.pop("requestMetadata", None)
@ -1204,9 +1227,6 @@ class AmazonConverseConfig(BaseConfig):
        output_config: Optional[OutputConfigBlock] = inference_params.pop(
            "outputConfig", None
        )
-        inference_params.pop(
-            "output_config", None
-        )  # Bedrock Converse doesn't support it

        # keep supported params in 'inference_params', and set all model-specific params in 'additional_request_params'
        additional_request_params = {
@ -1249,6 +1269,27 @@ class AmazonConverseConfig(BaseConfig):
            additional_request_params
        )

+        if anthropic_output_config is not None and isinstance(
+            anthropic_output_config, dict
+        ):
+            base_model = BedrockModelInfo.get_base_model(model)
+            if base_model.startswith("anthropic"):
+                if (
+                    litellm.drop_params is True
+                    and not AnthropicConfig._model_supports_effort_param(model)
+                ):
+                    litellm.verbose_logger.warning(
+                        DROP_UNSUPPORTED_OUTPUT_CONFIG_WARNING,
+                        model,
+                    )
+                else:
+                    effort = anthropic_output_config.get("effort")
+                    if effort is not None:
+                        self._validate_anthropic_adaptive_effort(
+                            model=model, effort=effort
+                        )
+                    additional_request_params["output_config"] = anthropic_output_config
+
        return (
            inference_params,
            additional_request_params,
@ -1372,9 +1413,25 @@ class AmazonConverseConfig(BaseConfig):
        # Append pre-formatted tools (systemTool etc.) after transformation
        bedrock_tools.extend(pre_formatted_tools)

+        # Opus 4.5 gates ``output_config.effort`` behind a beta header;
+        # Claude 4.6/4.7 accept it without one.
+        base_model = BedrockModelInfo.get_base_model(model)
+        if base_model.startswith("anthropic"):
+            output_config = additional_request_params.get("output_config")
+            if (
+                isinstance(output_config, dict)
+                and output_config.get("effort") is not None
+                and not AnthropicConfig._is_adaptive_thinking_model(model)
+            ):
+                from litellm.types.llms.anthropic import (
+                    ANTHROPIC_EFFORT_BETA_HEADER,
+                )
+
+                if ANTHROPIC_EFFORT_BETA_HEADER not in anthropic_beta_list:
+                    anthropic_beta_list.append(ANTHROPIC_EFFORT_BETA_HEADER)
+
        # Set anthropic_beta in additional_request_params if we have any beta features
        # ONLY apply to Anthropic/Claude models - other models (e.g., Qwen, Llama) don't support this field
-        base_model = BedrockModelInfo.get_base_model(model)
        if anthropic_beta_list and base_model.startswith("anthropic"):
            additional_request_params["anthropic_beta"] = anthropic_beta_list

--- a/litellm/llms/bedrock/chat/invoke_agent/transformation.py
+++ b/litellm/llms/bedrock/chat/invoke_agent/transformation.py
@ -12,6 +12,7 @@ import httpx

 from litellm._logging import verbose_logger
 from litellm._uuid import uuid
+from litellm.litellm_core_utils.url_utils import encode_url_path_segment
 from litellm.litellm_core_utils.prompt_templates.common_utils import (
    convert_content_list_to_str,
 )
@ -97,8 +98,15 @@ class AmazonInvokeAgentConfig(BaseConfig, BaseAWSLLM):

        agent_id, agent_alias_id = self._get_agent_id_and_alias_id(model)
        session_id = self._get_session_id(optional_params)
+        encoded_agent_id = encode_url_path_segment(agent_id, field_name="agent_id")
+        encoded_agent_alias_id = encode_url_path_segment(
+            agent_alias_id, field_name="agent_alias_id"
+        )
+        encoded_session_id = encode_url_path_segment(
+            session_id, field_name="session_id"
+        )

-        endpoint_url = f"{endpoint_url}/agents/{agent_id}/agentAliases/{agent_alias_id}/sessions/{session_id}/text"
+        endpoint_url = f"{endpoint_url}/agents/{encoded_agent_id}/agentAliases/{encoded_agent_alias_id}/sessions/{encoded_session_id}/text"

        return endpoint_url

--- a/litellm/llms/bedrock/chat/invoke_transformations/anthropic_claude3_transformation.py
+++ b/litellm/llms/bedrock/chat/invoke_transformations/anthropic_claude3_transformation.py
@ -169,7 +169,6 @@ class AmazonAnthropicClaudeConfig(AmazonInvokeConfig, AnthropicConfig):
        anthropic_request.pop("model", None)
        anthropic_request.pop("stream", None)
        anthropic_request.pop("output_format", None)
-        anthropic_request.pop("output_config", None)
        if "anthropic_version" not in anthropic_request:
            anthropic_request["anthropic_version"] = self.anthropic_version

--- a/litellm/llms/bedrock/count_tokens/transformation.py
+++ b/litellm/llms/bedrock/count_tokens/transformation.py
@ -201,13 +201,14 @@ class BedrockCountTokensConfig(BaseAWSLLM):
        # Remove bedrock/ prefix if present
        if model_id.startswith("bedrock/"):
            model_id = model_id[8:]  # Remove "bedrock/" prefix
+        encoded_model_id = self.encode_model_id(model_id=model_id)

        base_url, _ = self.get_runtime_endpoint(
            api_base=api_base,
            aws_bedrock_runtime_endpoint=aws_bedrock_runtime_endpoint,
            aws_region_name=aws_region_name,
        )
-        endpoint = f"{base_url}/model/{model_id}/count-tokens"
+        endpoint = f"{base_url}/model/{encoded_model_id}/count-tokens"

        return endpoint

--- a/litellm/llms/bedrock/files/handler.py
+++ b/litellm/llms/bedrock/files/handler.py
@ -1,10 +1,17 @@
 import asyncio
 import base64
-from typing import Any, Coroutine, Optional, Tuple, Union
+import os
+from types import MappingProxyType
+from typing import Any, Coroutine, Mapping, Optional, Tuple, Union, cast

 import httpx

 from litellm import LlmProviders
+from litellm.litellm_core_utils.cloud_storage_security import (
+    BEDROCK_MANAGED_S3_PREFIXES,
+    should_allow_legacy_cloud_file_ids,
+    validate_managed_cloud_file_id,
+)
 from litellm.llms.custom_httpx.http_handler import get_async_httpx_client
 from litellm.types.llms.openai import (
    FileContentRequest,
@ -35,7 +42,7 @@ class BedrockFilesHandler(BaseAWSLLM):

        The file ID can be in two formats:
        1. Base64-encoded unified file ID containing: llm_output_file_id,s3://bucket/path
-        2. Direct S3 URI: s3://bucket/path
+        2. Direct S3 URI: s3://bucket/litellm-managed-prefix/path

        Args:
            file_id: Encoded file ID or direct S3 URI
@ -58,14 +65,19 @@ class BedrockFilesHandler(BaseAWSLLM):
        except Exception:
            pass

-        # If not base64 encoded or doesn't contain llm_output_file_id, assume it's already an S3 URI
+        # If not base64 encoded or doesn't contain llm_output_file_id, accept only
+        # explicit S3 URIs. Bucket and key validation happens before any S3 call.
        if file_id.startswith("s3://"):
            return file_id

-        # If it doesn't start with s3://, assume it's a direct S3 URI and add the prefix
-        return f"s3://{file_id}"
+        raise ValueError("file_id must be a managed LiteLLM S3 file id")

-    def _parse_s3_uri(self, s3_uri: str) -> Tuple[str, str]:
+    def _parse_s3_uri(
+        self,
+        s3_uri: str,
+        configured_bucket_name: str,
+        allow_legacy_cloud_file_ids: bool = False,
+    ) -> Tuple[str, str]:
        """
        Parse S3 URI to extract bucket name and object key.

@ -75,21 +87,34 @@ class BedrockFilesHandler(BaseAWSLLM):
        Returns:
            Tuple of (bucket_name, object_key)
        """
-        if not s3_uri.startswith("s3://"):
-            raise ValueError(
-                f"Invalid S3 URI format: {s3_uri}. Expected format: s3://bucket-name/path/to/file"
+        return validate_managed_cloud_file_id(
+            file_id=s3_uri,
+            scheme="s3://",
+            configured_bucket_name=configured_bucket_name,
+            allowed_object_prefixes=BEDROCK_MANAGED_S3_PREFIXES,
+            allow_legacy_cloud_file_ids=allow_legacy_cloud_file_ids,
+        )
+
+    def _get_configured_s3_bucket_name(self, litellm_params: dict) -> str:
+        trusted_model_credentials = litellm_params.get(
+            "_litellm_internal_model_credentials"
+        )
+        bucket_name = None
+        if isinstance(trusted_model_credentials, type(MappingProxyType({}))):
+            trusted_model_credentials_mapping = cast(
+                Mapping[str, Any], trusted_model_credentials
            )
-
-        # Remove 's3://' prefix
-        path = s3_uri[5:]
-
-        if "/" in path:
-            bucket_name, object_key = path.split("/", 1)
-        else:
-            bucket_name = path
-            object_key = ""
-
-        return bucket_name, object_key
+            candidate_bucket_name = trusted_model_credentials_mapping.get(
+                "s3_bucket_name"
+            )
+            if isinstance(candidate_bucket_name, str):
+                bucket_name = candidate_bucket_name
+        bucket_name = bucket_name or os.getenv("AWS_S3_BUCKET_NAME")
+        if not bucket_name:
+            raise ValueError(
+                "S3 bucket_name is required. Set 's3_bucket_name' in proxy config or AWS_S3_BUCKET_NAME for Bedrock file content retrieval."
+            )
+        return bucket_name

    async def afile_content(
        self,
@ -119,7 +144,14 @@ class BedrockFilesHandler(BaseAWSLLM):

        # Extract S3 URI from file ID
        s3_uri = self._extract_s3_uri_from_file_id(file_id)
-        bucket_name, object_key = self._parse_s3_uri(s3_uri)
+        configured_bucket_name = self._get_configured_s3_bucket_name(optional_params)
+        bucket_name, object_key = self._parse_s3_uri(
+            s3_uri=s3_uri,
+            configured_bucket_name=configured_bucket_name,
+            allow_legacy_cloud_file_ids=should_allow_legacy_cloud_file_ids(
+                optional_params
+            ),
+        )

        # Get AWS credentials
        aws_region_name = self._get_aws_region_name(
--- a/litellm/llms/bedrock/files/transformation.py
+++ b/litellm/llms/bedrock/files/transformation.py
@ -2,6 +2,7 @@ import json
 import os
 import time
 from typing import Any, Dict, List, Optional, Tuple, Union
+from urllib.parse import unquote

 import httpx
 from httpx import Headers, Response
@ -10,6 +11,14 @@ from openai.types.file_deleted import FileDeleted
 from litellm._logging import verbose_logger
 from litellm._uuid import uuid
 from litellm.files.utils import FilesAPIUtils
+from litellm.litellm_core_utils.cloud_storage_security import (
+    BEDROCK_MANAGED_S3_BATCH_PREFIX,
+    BEDROCK_MANAGED_S3_UPLOAD_PREFIX,
+    build_managed_cloud_object_name,
+    encode_s3_object_key_for_url,
+    sanitize_cloud_object_component,
+    split_configured_cloud_bucket_name,
+)
 from litellm.litellm_core_utils.prompt_templates.common_utils import extract_file_data
 from litellm.llms.base_llm.chat.transformation import BaseLLMException
 from litellm.llms.base_llm.files.transformation import (
@ -116,10 +125,13 @@ class BedrockFilesConfig(BaseAWSLLM, BaseFilesConfig):
        if _model.startswith("bedrock/"):
            _model = _model[8:]

-        # Replace colons with hyphens for Bedrock S3 URI compliance
-        _model = _model.replace(":", "-")
+        safe_model = sanitize_cloud_object_component(
+            _model.replace(":", "-"), fallback="model"
+        )

-        object_name = f"litellm-bedrock-files-{_model}-{uuid.uuid4()}.jsonl"
+        object_name = (
+            f"{BEDROCK_MANAGED_S3_BATCH_PREFIX}{safe_model}-{uuid.uuid4()}.jsonl"
+        )
        return object_name

    def get_object_name(
@ -146,12 +158,13 @@ class BedrockFilesConfig(BaseAWSLLM, BaseFilesConfig):
            if len(openai_jsonl_content) > 0:
                return self._get_s3_object_name_from_batch_jsonl(openai_jsonl_content)

-        ## 2. If not jsonl, return the filename
+        ## 2. If not jsonl, store under a server-generated managed object name
        filename = extracted_file_data.get("filename")
-        if filename:
-            return filename
-        ## 3. If no file name, return timestamp
-        return str(int(time.time()))
+        return build_managed_cloud_object_name(
+            prefix=BEDROCK_MANAGED_S3_UPLOAD_PREFIX,
+            filename=filename,
+            fallback_filename="file",
+        )

    def get_complete_file_url(
        self,
@ -172,6 +185,7 @@ class BedrockFilesConfig(BaseAWSLLM, BaseFilesConfig):
            raise ValueError(
                "S3 bucket_name is required. Set 's3_bucket_name' in litellm_params or AWS_S3_BUCKET_NAME env var"
            )
+        bucket_name, object_prefix = split_configured_cloud_bucket_name(bucket_name)

        s3_region_name = litellm_params.get("s3_region_name") or optional_params.get(
            "s3_region_name"
@ -188,14 +202,17 @@ class BedrockFilesConfig(BaseAWSLLM, BaseFilesConfig):
            raise ValueError("purpose is required")
        extracted_file_data = extract_file_data(file_data)
        object_name = self.get_object_name(extracted_file_data, purpose)
+        if object_prefix:
+            object_name = f"{object_prefix}/{object_name}"
+        encoded_object_name = encode_s3_object_key_for_url(object_name)

        # S3 endpoint URL format
        s3_endpoint_url = (
            optional_params.get("s3_endpoint_url")
            or f"https://s3.{aws_region_name}.amazonaws.com"
-        )
+        ).rstrip("/")

-        return f"{s3_endpoint_url}/{bucket_name}/{object_name}"
+        return f"{s3_endpoint_url}/{bucket_name}/{encoded_object_name}"

    def get_supported_openai_params(
        self, model: str
@ -532,10 +549,12 @@ class BedrockFilesConfig(BaseAWSLLM, BaseFilesConfig):
        if match1:
            # Pattern: https://s3.region.amazonaws.com/bucket/key
            region, bucket, key = match1.groups()
+            key = unquote(key)
            s3_uri = f"s3://{bucket}/{key}"
        elif match2:
            # Pattern: https://bucket.s3.region.amazonaws.com/key
            bucket, region, key = match2.groups()
+            key = unquote(key)
            s3_uri = f"s3://{bucket}/{key}"
        else:
            # Fallback: try to extract bucket and key from URL path
@ -545,6 +564,7 @@ class BedrockFilesConfig(BaseAWSLLM, BaseFilesConfig):
            path_parts = parsed.path.lstrip("/").split("/", 1)
            if len(path_parts) >= 2:
                bucket, key = path_parts[0], path_parts[1]
+                key = unquote(key)
                s3_uri = f"s3://{bucket}/{key}"
            else:
                raise ValueError(f"Unable to parse S3 URL: {https_url}")
@ -722,7 +742,12 @@ class BedrockJsonlFilesTransformation:
        # Remove bedrock/ prefix if present
        if _model.startswith("bedrock/"):
            _model = _model[8:]
-        object_name = f"litellm-bedrock-files-{_model}-{uuid.uuid4()}.jsonl"
+        safe_model = sanitize_cloud_object_component(
+            _model.replace(":", "-"), fallback="model"
+        )
+        object_name = (
+            f"{BEDROCK_MANAGED_S3_BATCH_PREFIX}{safe_model}-{uuid.uuid4()}.jsonl"
+        )
        return object_name

    def _get_content_from_openai_file(self, openai_file_content: FileTypes) -> str:
--- a/litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py
+++ b/litellm/llms/bedrock/messages/invoke_transformations/anthropic_claude3_transformation.py
@ -12,9 +12,14 @@ from typing import (

 import httpx

+import litellm
 from litellm.anthropic_beta_headers_manager import filter_and_transform_beta_headers
 from litellm.constants import BEDROCK_MIN_THINKING_BUDGET_TOKENS
 from litellm.litellm_core_utils.litellm_logging import verbose_logger
+from litellm.llms.anthropic.chat.transformation import (
+    DROP_UNSUPPORTED_OUTPUT_CONFIG_WARNING,
+    AnthropicConfig,
+)
 from litellm.llms.anthropic.common_utils import AnthropicModelInfo
 from litellm.llms.anthropic.experimental_pass_through.messages.transformation import (
    AnthropicMessagesConfig,
@ -580,6 +585,17 @@ class AmazonAnthropicClaudeMessagesConfig(
        if filtered_betas:
            anthropic_messages_request["anthropic_beta"] = filtered_betas

+        if (
+            litellm.drop_params is True
+            and "output_config" in anthropic_messages_request
+            and not AnthropicConfig._model_supports_effort_param(model)
+        ):
+            verbose_logger.warning(
+                DROP_UNSUPPORTED_OUTPUT_CONFIG_WARNING,
+                model,
+            )
+            anthropic_messages_request.pop("output_config", None)
+
        # 7. Final safety net: filter top-level fields to the Bedrock Invoke allowlist.
        # Catches Anthropic-only extensions (context_management, output_config, speed,
        # mcp_servers, ...) and any future additions Claude Code may start sending.
--- a/litellm/llms/bedrock/vector_stores/transformation.py
+++ b/litellm/llms/bedrock/vector_stores/transformation.py
@ -5,6 +5,7 @@ from urllib.parse import urlparse
 import httpx

 from litellm._logging import verbose_logger
+from litellm.litellm_core_utils.url_utils import encode_url_path_segment
 from litellm.llms.base_llm.vector_store.transformation import BaseVectorStoreConfig
 from litellm.llms.bedrock.base_aws_llm import BaseAWSLLM
 from litellm.types.integrations.rag.bedrock_knowledgebase import (
@ -209,7 +210,10 @@ class BedrockVectorStoreConfig(BaseVectorStoreConfig, BaseAWSLLM):
        if isinstance(query, list):
            query = " ".join(query)

-        url = f"{api_base}/{vector_store_id}/retrieve"
+        encoded_vector_store_id = encode_url_path_segment(
+            vector_store_id, field_name="vector_store_id"
+        )
+        url = f"{api_base}/{encoded_vector_store_id}/retrieve"

        request_body: Dict[str, Any] = {
            "retrievalQuery": BedrockKBRetrievalQuery(text=query),
--- a/litellm/llms/black_forest_labs/image_edit/handler.py
+++ b/litellm/llms/black_forest_labs/image_edit/handler.py
@ -15,6 +15,7 @@ import httpx
 import litellm
 from litellm._logging import verbose_logger
 from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj
+from litellm.litellm_core_utils.url_utils import SSRFError, assert_same_origin
 from litellm.llms.custom_httpx.http_handler import (
    AsyncHTTPHandler,
    HTTPHandler,
@ -331,6 +332,17 @@ class BlackForestLabsImageEdit:
                message="No polling_url in BFL response",
            )

+        # Reject cross-origin polling URLs — the ``x-key`` auth header
+        # would otherwise leak to whatever URL the upstream returns.
+        # VERIA-51.
+        try:
+            assert_same_origin(polling_url, str(initial_response.request.url))
+        except SSRFError as ssrf_err:
+            raise BlackForestLabsError(
+                status_code=502,
+                message=f"Rejected polling URL: {ssrf_err}",
+            )
+
        # Get just the auth header for polling
        polling_headers = {"x-key": headers.get("x-key", "")}

@ -416,6 +428,17 @@ class BlackForestLabsImageEdit:
                message="No polling_url in BFL response",
            )

+        # Reject cross-origin polling URLs — the ``x-key`` auth header
+        # would otherwise leak to whatever URL the upstream returns.
+        # VERIA-51.
+        try:
+            assert_same_origin(polling_url, str(initial_response.request.url))
+        except SSRFError as ssrf_err:
+            raise BlackForestLabsError(
+                status_code=502,
+                message=f"Rejected polling URL: {ssrf_err}",
+            )
+
        # Get just the auth header for polling
        polling_headers = {"x-key": headers.get("x-key", "")}

--- a/Show More
+++ b/Show More