litellm/tests/test_litellm/proxy/test_mcp_asgi_response.py
Sameer Kankute 466f06df6d
fix(mcp): surface upstream 401 for token-forwarding MCP servers (#27847)
* fix(mcp): surface upstream 401 for token-forwarding MCP servers

For MCP servers configured with extra_headers: [Authorization], the gateway
forwards the client token directly to the upstream. When that token is rejected
(expired or invalid) the upstream returns 401, but the MCP SDK starts the SSE
stream with 200 OK before calling handlers, so the 401 can't be returned
mid-stream.

Fix: add a pre-flight httpx probe in handle_streamable_http_mcp — before the
SDK opens the session — so the gateway can still return HTTP 401 with
WWW-Authenticate: Bearer authorization_uri=<gateway-discovery-url> when the
upstream rejects the token. The probe fails-open (returns 200) on network
errors so a transient hiccup does not block valid requests.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mcp): parallelize pre-flight auth probes and use HEAD to avoid side effects

- Extract forwarded_auth outside the pass-through server loop (was called N times for the same scope value)
- Gather all upstream auth probes concurrently with asyncio.gather instead of sequentially; eliminates N×5 s worst-case latency
- Switch probe from POST+initialize JSON-RPC body to HEAD request; HEAD carries the Authorization header so the upstream rejects invalid tokens with 401 but never allocates a session or writes an audit entry

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mcp): use get_async_httpx_client in _probe_upstream_auth

Replaces bare httpx.AsyncClient with the project-standard
get_async_httpx_client(httpxSpecialProvider.MCP) to satisfy the
ensure_async_clients_test code coverage check and avoid the +500 ms
per-request overhead of creating a new client on every probe call.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(mcp): extract pre-flight probe into _check_passthrough_upstream_auth

Moves the parallel upstream auth probe logic out of
handle_streamable_http_mcp into a dedicated helper to satisfy
Ruff PLR0915 (Too many statements > 50).

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mcp): gate pre-flight probes on authorized server set to prevent bypass

_check_passthrough_upstream_auth was resolving user-supplied server names
directly before authorization ran, letting any permitted LiteLLM key
trigger an upstream HEAD probe to a server it was not allowed to use.

Changes:
- Call _get_allowed_mcp_servers inside the helper so only servers the
  caller's key is authorized for are probed.
- Move the call site to after toolset scoping so the auth context is
  fully resolved before the probe list is built.
- Thread user_api_key_auth into the helper signature (replaces the raw
  mcp_servers name list).

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add async HTTP HEAD support

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(mcp): use Scope type annotation in _get_forwarded_auth_from_scope

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fix MCP upstream auth probe method

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* Remove unused AsyncHTTPHandler head method

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(mcp): exclude has_client_credentials servers from pre-flight auth probe

_prepare_mcp_server_headers skips caller Authorization when the server
uses OAuth client-credentials (M2M), but the pre-flight probe was still
selecting those servers and forwarding the caller's raw token in the HEAD
request. Exclude servers with has_client_credentials from the probe list
to match the actual downstream header-preparation logic.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mcp): propagate upstream 403 as 403, not 401 with WWW-Authenticate

Per RFC 9110, 401 means "go get new credentials." Mapping an upstream 403
to a gateway 401 causes OAuth clients to restart the authorization flow,
obtain a fresh token with identical scopes, hit 403 again, and loop
indefinitely.

401 from upstream → gateway 401 + WWW-Authenticate (re-authorize)
403 from upstream → gateway 403 (no WWW-Authenticate hint)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(mcp): skip auth probe when Authorization may be the LiteLLM proxy key

The pre-flight upstream probe must not forward the caller's Authorization
header when it could itself be the LiteLLM proxy API key. Restrict the
probe to requests that supply x-litellm-api-key explicitly — only then is
the Authorization header unambiguously the upstream OAuth token the
caller wants forwarded.

* Fix MCP ASGI HTTPException propagation

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(mcp): use public AsyncHTTPHandler.post() in auth probe

Use AsyncHTTPHandler.post() and catch httpx.HTTPStatusError explicitly so
the 401/403 we want to surface is not silently swallowed by the broad
fail-open except Exception block. Avoids reaching into the handler's
private client attribute, which would silently regress to fail-open if
AsyncHTTPHandler is ever refactored.

* Fix MCP auth probe tests

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* test(mcp): add coverage for httpx.HTTPStatusError path in auth probe

AsyncHTTPHandler.post() calls raise_for_status() internally, so a real
upstream 401/403 lands as httpx.HTTPStatusError. Add a test that exercises
that specific exception path so a regression that swallows the error in
the broad fail-open except Exception would be caught.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: claude-bot <claude-bot@anthropic.com>
2026-05-13 12:03:36 -07:00

37 lines
1.1 KiB
Python

import asyncio
import pytest
from fastapi import HTTPException
from litellm.proxy.proxy_server import _stream_mcp_asgi_response
@pytest.mark.asyncio
async def test_stream_mcp_asgi_response_propagates_pre_header_http_exception():
async def handle_fn(_scope, _receive, _send):
raise HTTPException(
status_code=401,
detail="Unauthorized",
headers={
"WWW-Authenticate": "Bearer authorization_uri=https://example.test/auth"
},
)
async def receive():
return {"type": "http.request", "body": b"", "more_body": False}
with pytest.raises(HTTPException) as exc_info:
await asyncio.wait_for(
_stream_mcp_asgi_response(
handle_fn,
{"type": "http", "method": "POST", "path": "/mcp", "headers": []},
receive,
),
timeout=1.0,
)
assert exc_info.value.status_code == 401
assert exc_info.value.headers == {
"WWW-Authenticate": "Bearer authorization_uri=https://example.test/auth"
}