Split the monolithic LiteLLM proxy into independently scalable Kubernetes components to allow separate horizontal scaling of the LLM data plane and management API surfaces - Add DatabaseURLSettings pydantic-settings model that assembles DATABASE_URL (and optional DATABASE_URL_READ_REPLICA) from discrete DATABASE_* env vars before Prisma initializes, supporting both IAM token auth (minting short-lived RDS tokens) and password auth; replaces the CLI-only path that componentized entrypoints bypass - Add gateway component (port 4000) that trims the proxy route table to the LLM data-plane surface (chat, embeddings, completions, audio, realtime, provider passthroughs, health/metrics) via an allowlist applied inside the lifespan context so plugin-registered routes are captured - Add backend component (port 4001) that exposes the management/admin surface (keys, users, teams, orgs, spend analytics, model management, SSO, audit logs) with a complementary allowlist - Add ui component — Next.js static export served by nginx (port 3000) with RSC payload routing, asset prefix aliasing, and SPA fallback for dashboard routes - Add migrations component with dedicated Dockerfile that runs prisma migrate deploy via a Helm pre-install/pre-upgrade Job, eliminating per-pod schema contention on the Prisma advisory lock - Add Helm chart (helm/litellm) with separate Deployments, Services, HPAs, and ConfigMap for each component; shared _helpers.tpl emits DATABASE_*, IAM_TOKEN_DB_AUTH, REDIS_*, and DISABLE_SCHEMA_UPDATE env vars from chart values; ingress template routes traffic to the correct component by path prefix - Add comprehensive tests for DatabaseURLSettings covering IAM auth, password auth, read replica fallbacks, operator-pinned URL preservation, and percent-encoding; add coverage test asserting gateway + backend allowlist union equals the full proxy route set - Add pydantic-settings>=2.14.1 as a proxy extra dependency and update liccheck allowlist Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
60 lines
2.2 KiB
Python
60 lines
2.2 KiB
Python
"""Gateway entrypoint.
|
|
|
|
Reuses the existing FastAPI app from `litellm.proxy.proxy_server` and trims its
|
|
route table to just the LLM data-plane surface. The trim is purely additive —
|
|
no existing module is modified, the full app continues to work via the legacy
|
|
entrypoint (`litellm.proxy.proxy_server:app`).
|
|
|
|
Run with:
|
|
uvicorn gateway.main:app --host 0.0.0.0 --port 4000
|
|
"""
|
|
|
|
from contextlib import asynccontextmanager
|
|
|
|
from fastapi.routing import Mount
|
|
|
|
# Assemble DATABASE_URL (+ DATABASE_URL_READ_REPLICA) from the discrete
|
|
# DATABASE_* env vars before proxy_server imports spin up Prisma. Handles
|
|
# both IAM (mint a token) and password auth, writer and reader. The standard
|
|
# CLI flow does this in proxy_cli.py; we bypass proxy_cli by uvicorn'ing the
|
|
# app directly, so without this Prisma initializes with the placeholder URL
|
|
# and every DB-needing endpoint returns "Database not connected".
|
|
from litellm.proxy.db.db_url_settings import DatabaseURLSettings
|
|
|
|
DatabaseURLSettings.from_env().apply_to_env()
|
|
|
|
from litellm.proxy.proxy_server import app
|
|
|
|
from gateway.routes.allowlist import GATEWAY_EXACT_PATHS, GATEWAY_PATH_PREFIXES
|
|
|
|
|
|
def _is_gateway_route(route) -> bool:
|
|
"""Keep the route on the gateway if its path is in the LLM data-plane surface."""
|
|
path = getattr(route, "path", None)
|
|
if path is None:
|
|
return False
|
|
if isinstance(route, Mount):
|
|
# Gateway never serves the static UI or its asset bundles.
|
|
return False
|
|
if path in GATEWAY_EXACT_PATHS:
|
|
return True
|
|
return any(path.startswith(prefix) for prefix in GATEWAY_PATH_PREFIXES)
|
|
|
|
|
|
# Wrap proxy_server's existing lifespan so the route trim runs *after* its
|
|
# startup hooks (and any plugin code those hooks load) have had a chance to
|
|
# register routes. A module-load filter would miss routes added during
|
|
# startup; running inside the lifespan, after the inner __aenter__, catches
|
|
# them while still completing before uvicorn opens the listener.
|
|
_proxy_lifespan = app.router.lifespan_context
|
|
|
|
|
|
@asynccontextmanager
|
|
async def _gateway_lifespan(app_):
|
|
async with _proxy_lifespan(app_):
|
|
app_.router.routes = [r for r in app_.router.routes if _is_gateway_route(r)]
|
|
yield
|
|
|
|
|
|
app.router.lifespan_context = _gateway_lifespan
|