Split the monolithic LiteLLM proxy into independently scalable Kubernetes components to allow separate horizontal scaling of the LLM data plane and management API surfaces - Add DatabaseURLSettings pydantic-settings model that assembles DATABASE_URL (and optional DATABASE_URL_READ_REPLICA) from discrete DATABASE_* env vars before Prisma initializes, supporting both IAM token auth (minting short-lived RDS tokens) and password auth; replaces the CLI-only path that componentized entrypoints bypass - Add gateway component (port 4000) that trims the proxy route table to the LLM data-plane surface (chat, embeddings, completions, audio, realtime, provider passthroughs, health/metrics) via an allowlist applied inside the lifespan context so plugin-registered routes are captured - Add backend component (port 4001) that exposes the management/admin surface (keys, users, teams, orgs, spend analytics, model management, SSO, audit logs) with a complementary allowlist - Add ui component — Next.js static export served by nginx (port 3000) with RSC payload routing, asset prefix aliasing, and SPA fallback for dashboard routes - Add migrations component with dedicated Dockerfile that runs prisma migrate deploy via a Helm pre-install/pre-upgrade Job, eliminating per-pod schema contention on the Prisma advisory lock - Add Helm chart (helm/litellm) with separate Deployments, Services, HPAs, and ConfigMap for each component; shared _helpers.tpl emits DATABASE_*, IAM_TOKEN_DB_AUTH, REDIS_*, and DISABLE_SCHEMA_UPDATE env vars from chart values; ingress template routes traffic to the correct component by path prefix - Add comprehensive tests for DatabaseURLSettings covering IAM auth, password auth, read replica fallbacks, operator-pinned URL preservation, and percent-encoding; add coverage test asserting gateway + backend allowlist union equals the full proxy route set - Add pydantic-settings>=2.14.1 as a proxy extra dependency and update liccheck allowlist Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
100 lines
3.6 KiB
Nginx Configuration File
100 lines
3.6 KiB
Nginx Configuration File
worker_processes auto;
|
|
events { worker_connections 1024; }
|
|
|
|
http {
|
|
include /etc/nginx/mime.types;
|
|
default_type application/octet-stream;
|
|
sendfile on;
|
|
tcp_nopush on;
|
|
keepalive_timeout 65;
|
|
|
|
gzip on;
|
|
gzip_comp_level 4;
|
|
gzip_min_length 1024;
|
|
gzip_proxied any;
|
|
gzip_types
|
|
application/javascript
|
|
application/json
|
|
text/css
|
|
text/html
|
|
image/svg+xml
|
|
font/woff
|
|
font/woff2;
|
|
|
|
server {
|
|
listen 3000 default_server;
|
|
server_name _;
|
|
root /usr/share/nginx/html;
|
|
|
|
# next.config.mjs sets assetPrefix=/litellm-asset-prefix, which makes
|
|
# the built HTML reference /litellm-asset-prefix/_next/... — but the
|
|
# static export only emits files under /_next/. Map the prefix to
|
|
# the real tree at request time instead of duplicating the directory
|
|
# at build time. NB: alias rewrites the location prefix, so
|
|
# /litellm-asset-prefix/_next/foo.js → /usr/share/nginx/html/_next/foo.js.
|
|
location /litellm-asset-prefix/_next/ {
|
|
alias /usr/share/nginx/html/_next/;
|
|
expires 1y;
|
|
add_header Cache-Control "public, immutable";
|
|
}
|
|
|
|
# Content-hashed asset bundles — cache forever.
|
|
location /_next/ {
|
|
try_files $uri =404;
|
|
expires 1y;
|
|
add_header Cache-Control "public, immutable";
|
|
}
|
|
location /assets/ {
|
|
try_files $uri =404;
|
|
expires 1y;
|
|
add_header Cache-Control "public, immutable";
|
|
}
|
|
location = /favicon.ico {
|
|
try_files $uri =404;
|
|
expires 1d;
|
|
}
|
|
|
|
# Probe target — doesn't depend on disk.
|
|
location = /healthz { default_type text/plain; return 200 "ok\n"; }
|
|
|
|
# Next.js App Router (output: "export") emits an RSC/flight payload
|
|
# as <route>.txt next to <route>.html, plus __next.*.txt segment
|
|
# data. The client router fetches these on soft navigation/prefetch
|
|
# (?_rsc=<hash>) — the query string is irrelevant, files resolve by
|
|
# $uri. These MUST be served from the export: if they fall through
|
|
# to the catch-all 404 below, client-side navigation never settles
|
|
# and the login flow spins in an infinite redirect loop
|
|
# (/ ⇄ /ui/login). Keep this BEFORE the /ui/ regex — ^/ui/(.+)$ is
|
|
# also a regex and nginx takes the first matching one, so a stray
|
|
# /ui/<page>.txt would otherwise be rewritten to HTML and break RSC
|
|
# for nested routes. A genuinely missing payload must 404 (the
|
|
# router degrades to a hard navigation); never fall back to HTML.
|
|
location ~ \.txt$ {
|
|
try_files $uri =404;
|
|
}
|
|
|
|
# /ui[/<page>] — the dashboard's JS hardcodes URLs under this prefix
|
|
# (router.replace("/ui"), buildLoginUrlWithReturn("/ui/login"), ...).
|
|
# Mirror what FastAPI StaticFiles(mount="/ui") did in the monolithic
|
|
# proxy_server: serve /ui/<page> from out/<page>.html, with App
|
|
# Router-aware fallback (out/<page>/index.html) and a final SPA
|
|
# fallback to out/index.html for client-side routes.
|
|
location = /ui { try_files /index.html =404; }
|
|
location = /ui/ { try_files /index.html =404; }
|
|
location ~ ^/ui/(.+)$ {
|
|
try_files /$1.html /$1/index.html /index.html =404;
|
|
}
|
|
|
|
# `/` is handy for direct-debug port-forwards.
|
|
location = / { try_files /index.html =404; }
|
|
|
|
# Anything else (API calls etc.) returns 404 from the UI's
|
|
# perspective. A reverse proxy in front of this image routes the
|
|
# API surface (/v1, /key, /.well-known/litellm-ui-config, ...) to
|
|
# gateway/backend before requests get here; if something slips
|
|
# through, fall through to a 404 instead of accidentally serving
|
|
# HTML and confusing a JSON-expecting caller.
|
|
location / { return 404; }
|
|
}
|
|
}
|