Split the monolithic LiteLLM proxy into independently scalable Kubernetes components to allow separate horizontal scaling of the LLM data plane and management API surfaces - Add DatabaseURLSettings pydantic-settings model that assembles DATABASE_URL (and optional DATABASE_URL_READ_REPLICA) from discrete DATABASE_* env vars before Prisma initializes, supporting both IAM token auth (minting short-lived RDS tokens) and password auth; replaces the CLI-only path that componentized entrypoints bypass - Add gateway component (port 4000) that trims the proxy route table to the LLM data-plane surface (chat, embeddings, completions, audio, realtime, provider passthroughs, health/metrics) via an allowlist applied inside the lifespan context so plugin-registered routes are captured - Add backend component (port 4001) that exposes the management/admin surface (keys, users, teams, orgs, spend analytics, model management, SSO, audit logs) with a complementary allowlist - Add ui component — Next.js static export served by nginx (port 3000) with RSC payload routing, asset prefix aliasing, and SPA fallback for dashboard routes - Add migrations component with dedicated Dockerfile that runs prisma migrate deploy via a Helm pre-install/pre-upgrade Job, eliminating per-pod schema contention on the Prisma advisory lock - Add Helm chart (helm/litellm) with separate Deployments, Services, HPAs, and ConfigMap for each component; shared _helpers.tpl emits DATABASE_*, IAM_TOKEN_DB_AUTH, REDIS_*, and DISABLE_SCHEMA_UPDATE env vars from chart values; ingress template routes traffic to the correct component by path prefix - Add comprehensive tests for DatabaseURLSettings covering IAM auth, password auth, read replica fallbacks, operator-pinned URL preservation, and percent-encoding; add coverage test asserting gateway + backend allowlist union equals the full proxy route set - Add pydantic-settings>=2.14.1 as a proxy extra dependency and update liccheck allowlist Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
99 lines
4.2 KiB
Docker
99 lines
4.2 KiB
Docker
ARG LITELLM_BUILD_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:31da6565f35af6401031c1d7aa91dc84ac76c5c48edd17fb90f0ed9e3173c7a9
|
|
ARG LITELLM_RUNTIME_IMAGE=cgr.dev/chainguard/wolfi-base@sha256:31da6565f35af6401031c1d7aa91dc84ac76c5c48edd17fb90f0ed9e3173c7a9
|
|
ARG UV_IMAGE=ghcr.io/astral-sh/uv:0.11.7@sha256:240fb85ab0f263ef12f492d8476aa3a2e4e1e333f7d67fbdd923d00a506a516a
|
|
|
|
FROM $UV_IMAGE AS uvbin
|
|
|
|
# ---------- Builder ----------
|
|
#
|
|
# Minimal install for `prisma migrate deploy`. We deliberately skip the heavy
|
|
# `proxy-runtime` (otel, sentry, ddtrace, pypdf, google-genai, anthropic-vertex,
|
|
# ...) and `semantic-router` extras that the gateway/backend pull in — the
|
|
# migration engine doesn't need them. We DO install `--extra proxy` so the
|
|
# DB-URL helper from `litellm.proxy.auth.rds_iam_token` is importable, which
|
|
# is how the gateway and backend assemble `DATABASE_URL` at pod startup when
|
|
# `IAM_TOKEN_DB_AUTH=true` (see backend/main.py:17, gateway/main.py:22). And
|
|
# `--extra extra_proxy` provides the `prisma` CLI + the secret-manager
|
|
# backends `litellm.secret_managers.main` lazily imports.
|
|
#
|
|
# `prisma generate` runs once at BUILD time to (a) install the Node-based
|
|
# Prisma CLI into the binary cache and (b) download the migration / query
|
|
# engine binaries. The Python client it also produces is unused by this
|
|
# image's runtime entrypoint — that's fine, it's a few hundred KB and the
|
|
# alternative (`prisma py fetch`) doesn't reliably trigger engine downloads
|
|
# under nodeenv. Crucially we do NOT run `prisma generate` at RUNTIME; the
|
|
# old migration job did, on every pod start, which is the wasteful behaviour
|
|
# the componentization is fixing.
|
|
FROM $LITELLM_BUILD_IMAGE AS builder
|
|
|
|
WORKDIR /app
|
|
USER root
|
|
|
|
COPY --from=uvbin /uv /uvx /usr/local/bin/
|
|
|
|
RUN apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile
|
|
|
|
ENV UV_PROJECT_ENVIRONMENT=/app/.venv \
|
|
UV_LINK_MODE=copy \
|
|
UV_COMPILE_BYTECODE=1 \
|
|
UV_PYTHON_DOWNLOADS=0 \
|
|
PATH="/app/.venv/bin:${PATH}"
|
|
|
|
# Stage 1 — install third-party deps only (cached by pyproject.toml/uv.lock).
|
|
RUN --mount=type=cache,target=/root/.cache/uv \
|
|
--mount=type=bind,source=pyproject.toml,target=pyproject.toml \
|
|
--mount=type=bind,source=uv.lock,target=uv.lock \
|
|
--mount=type=bind,source=enterprise/pyproject.toml,target=enterprise/pyproject.toml \
|
|
--mount=type=bind,source=litellm-proxy-extras/pyproject.toml,target=litellm-proxy-extras/pyproject.toml \
|
|
uv sync --frozen --no-install-project --no-install-workspace --no-default-groups --no-editable \
|
|
--extra proxy \
|
|
--extra extra_proxy \
|
|
--python python3
|
|
|
|
# Stage 2 — copy source and install the project + workspace members.
|
|
COPY . .
|
|
|
|
RUN --mount=type=cache,target=/root/.cache/uv \
|
|
uv sync --frozen --no-default-groups --no-editable \
|
|
--extra proxy \
|
|
--extra extra_proxy \
|
|
--python python3
|
|
|
|
COPY migrations/run.py /app/run.py
|
|
|
|
# Pre-warm the Prisma binary cache so the Job pod doesn't reach the
|
|
# internet on first start. This matches what the backend Dockerfile does:
|
|
# `prisma generate` runs nodeenv (downloads Node), installs the prisma npm
|
|
# CLI, downloads the engine binaries for each `binaryTarget` in
|
|
# schema.prisma, AND emits the generated Python client. We don't need the
|
|
# client at runtime — the migration job invokes `prisma migrate deploy`
|
|
# via subprocess — but having it cached is harmless and the alternative
|
|
# (`prisma py fetch`) doesn't reliably trigger engine downloads.
|
|
RUN mkdir -p /home/nonroot && \
|
|
HOME=/home/nonroot prisma generate --schema=./schema.prisma && \
|
|
chown -R nonroot:nonroot /home/nonroot/.cache
|
|
|
|
# ---------- Runtime ----------
|
|
FROM $LITELLM_RUNTIME_IMAGE AS runtime
|
|
|
|
USER root
|
|
|
|
RUN apk add --no-cache bash openssl tzdata python3 libsndfile libatomic
|
|
|
|
# wolfi-base ships an unprivileged `nonroot` account (UID/GID 65532). The
|
|
# Prisma engine binaries are dynamically linked against libssl/libcrypto, so
|
|
# openssl stays in the runtime layer.
|
|
WORKDIR /app
|
|
ENV HOME=/home/nonroot \
|
|
PATH="/app/.venv/bin:${PATH}" \
|
|
PYTHONPATH="/app" \
|
|
PYTHONDONTWRITEBYTECODE=1 \
|
|
PYTHONUNBUFFERED=1
|
|
|
|
COPY --from=builder --chown=nonroot:nonroot /app /app
|
|
COPY --from=builder --chown=nonroot:nonroot /home/nonroot/.cache /home/nonroot/.cache
|
|
|
|
USER nonroot
|
|
|
|
ENTRYPOINT ["python3", "/app/run.py"]
|