feat: add Terraform stacks for deploying LiteLLM on AWS and GCP (#27673)

- Add AWS ECS Fargate stack with Aurora Postgres (IAM auth), ElastiCache Redis, S3, ALB with path-based routing to gateway/backend/ui components, Application Auto Scaling, and automated DB bootstrap + prisma migration via local-exec provisioners
- Add GCP Cloud Run stack with Cloud SQL Postgres (password auth), Memorystore Redis, GCS, external HTTPS load balancer with serverless NEGs and URL map routing, and automated prisma migration via Cloud Run Job
- Both stacks support typed proxy_config input mirroring the helm chart's gateway.config.proxy_config, per-component extra env vars, and Secret Manager references for provider API keys
- Gateway/backend services depend on terraform_data.migration so they never start before the schema is in place, eliminating crash-loop windows on first apply
- AWS stack uses IAM database authentication with a one-shot Fargate bootstrap task that creates and grants the rds_iam role to the application user; GCP stack uses password auth assembled at container startup to avoid Cloud SQL Auth Proxy sidecar complexity
- Add .gitignore rules for Terraform state files, plan files, tfvars inputs, provider binaries, and crash logs while explicitly keeping .terraform.lock.hcl for provider version pinning
- Include terraform.tfvars.example files, provider lock files, and comprehensive README documentation covering architecture, TLS setup, image pull strategies, and quick-start instructions for both stacks

Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
This commit is contained in:
Yassin Kortam 2026-05-16 17:26:20 -07:00 committed by GitHub
parent fbe0ee81f1
commit 3d5a9ede05
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
38 changed files with 4626 additions and 0 deletions

19
.gitignore vendored
View File

@ -101,4 +101,23 @@ STABILIZATION_TODO.md
**/*.storageState.json
**/coverage
test-config
# ---------- Terraform ----------
# Provider binaries + module cache — regenerated by `terraform init`.
**/.terraform/
# State files often contain secrets (DB passwords, API keys snapshotted from
# data sources). Keep state in a remote backend, never in git.
*.tfstate
*.tfstate.*
*.tfstate.backup
# Plan files can also contain sensitive values (variables in plaintext).
*.tfplan
# User-specific variable inputs — example files (terraform.tfvars.example) are
# tracked because they end in .example, which doesn't match the glob below.
*.tfvars
*.auto.tfvars
crash.log
crash.*.log
# .terraform.lock.hcl is intentionally NOT ignored — it pins provider versions
# and should be committed.
.vscode

159
terraform/litellm/README.md Normal file
View File

@ -0,0 +1,159 @@
# LiteLLM Terraform stacks
Two self-contained Terraform root modules that deploy the **componentized**
LiteLLM proxy — the gateway, backend, and UI as three independent containers
(see `helm/litellm/` for the canonical chart with the same split).
| Stack | Compute | Database (writer + reader) | Cache | Object store | Public entrypoint |
| ------ | ----------- | ---------------------------------- | ----------- | ------------ | ------------------ |
| `aws/` | ECS Fargate | Aurora Postgres (IAM auth) | ElastiCache | S3 | Application LB |
| `gcp/` | Cloud Run | Cloud SQL Postgres (password auth) | Memorystore | GCS | External HTTPS LB |
Each stack creates its own VPC and managed data stores — drop in a tfvars
file and run `terraform apply`. Both stacks support a typed `proxy_config`
input (mirrors `helm/litellm`'s `gateway.config.proxy_config`) and per-component
extra env vars / secret-manager refs.
## Components
The proxy is split into three deployables:
| Component | Default image | Port | Role |
| --------- | ---------------------------------------- | ---- | -------------------------------------------------------------------- |
| `gateway` | `ghcr.io/berriai/litellm-gateway:main-stable` | 4000 | LLM data plane (`/v1/chat/completions`, `/v1/embeddings`, …) |
| `backend` | `ghcr.io/berriai/litellm-backend:main-stable` | 4001 | Management API (`/key/*`, `/user/*`, `/team/*`, `/model/*`, …) |
| `ui` | `ghcr.io/berriai/litellm-ui:main-stable` | 3000 | Static Next.js dashboard served by nginx |
The load balancer routes gateway path prefixes (mirrored verbatim from
`gateway/routes/allowlist.py`) to the gateway, UI asset paths (`/`,
`/litellm-asset-prefix/*`, `/_next/*`, `/favicon.ico`) to the UI, and
everything else to the backend.
## Architecture
### AWS (`terraform/litellm/aws/`)
```
┌───────────────────────────────────────┐
│ Public Internet │
└─────────────────┬─────────────────────┘
│ HTTP/80
┌───────────────▼───────────────┐
│ Application Load Balancer │
│ (path-routing listener) │
└─┬─────────────┬─────────────┬─┘
│ │ │
UI assets, / │ /v1/chat, │ /key/* │
/_next/*, … │ /v1/embed, │ /user/* │
│ … │ … │
┌─────────────▼───┐ ┌──────▼──────┐ ┌───▼──────────────┐
│ ECS Service │ │ ECS Service │ │ ECS Service │
│ (ui) │ │ (gateway) │ │ (backend) │
│ Fargate :3000 │ │ Fargate:4000│ │ Fargate :4001 │
└─────────────────┘ └──────┬──────┘ └────────┬─────────┘
│ │
┌─── private subnets (one per AZ) ──────────────────────┐
│ │
│ ┌────────────────────────┐ ┌────────────────┐ │
│ │ Aurora Postgres │ │ ElastiCache │ │
│ │ cluster (IAM auth) │ │ Redis (1 node)│ │
│ │ ┌───────┐ ┌───────┐ │ └────────────────┘ │
│ │ │writer │ │reader │ │ │
│ │ └───────┘ └───────┘ │ ┌────────────────┐ │
│ └────────────────────────┘ │ S3 bucket │ │
│ │ (versioned) │ │
│ ┌────────────────────────┐ └────────────────┘ │
│ │ Secrets Manager │ │
│ │ • LITELLM_MASTER_KEY │ ┌────────────────┐ │
│ │ • DB master password │ │ One-off ECS │ │
│ │ • user-supplied API │ │ task: prisma │ │
│ │ keys (referenced) │ │ migrate deploy │ │
│ └────────────────────────┘ └────────────────┘ │
│ │
└─── VPC ───────────────────────────────────────────────┘
│ NAT gateway in one public subnet
egress to LLM providers
```
### GCP (`terraform/litellm/gcp/`)
```
┌───────────────────────────────────────┐
│ Public Internet │
└─────────────────┬─────────────────────┘
│ HTTP/80
┌───────────────▼───────────────┐
│ External HTTPS Load Balancer │
│ (global, URL map routing) │
└─┬─────────────┬─────────────┬─┘
│ │ │
│ Serverless NEGs (one per service)
│ │ │
┌─────────────▼───┐ ┌──────▼──────┐ ┌───▼──────────────┐
│ Cloud Run │ │ Cloud Run │ │ Cloud Run │
│ (ui) │ │ (gateway) │ │ (backend) │
│ :3000 │ │ :4000 │ │ :4001 │
└─────────────────┘ └──────┬──────┘ └────────┬─────────┘
│ │
│ Serverless VPC Access connector
┌─── VPC (private services access range) ──────────────────┐
│ │
│ ┌────────────────────────┐ ┌──────────────────┐ │
│ │ Cloud SQL Postgres │ │ Memorystore │ │
│ │ ┌───────┐ ┌───────┐ │ │ Redis │ │
│ │ │writer │ │reader │ │ └──────────────────┘ │
│ │ └───────┘ └───────┘ │ │
│ └────────────────────────┘ ┌──────────────────┐ │
│ │ GCS bucket │ │
│ ┌────────────────────────┐ │ (versioned) │ │
│ │ Secret Manager │ └──────────────────┘ │
│ │ • LITELLM_MASTER_KEY │ │
│ │ • DB password │ ┌──────────────────┐ │
│ │ • user-supplied API │ │ Cloud Run Job: │ │
│ │ keys (referenced) │ │ prisma migrate │ │
│ └────────────────────────┘ │ deploy │ │
│ └──────────────────┘ │
└──────────────────────────────────────────────────────────┘
```
## Images
Both stacks take per-component image references as variables. The defaults
point at the public `ghcr.io/berriai/litellm-<component>:main-stable`
images, so the stack is runnable end-to-end without pre-flight setup —
pin to a specific tag for production:
- **AWS** can pull from any registry the task execution role can reach.
The role gets `AmazonECSTaskExecutionRolePolicy` attached, which grants
ECR pull permissions for repositories in the same account.
- **GCP Cloud Run** can only pull from Artifact Registry or
`gcr.io`-style registries. To use images hosted elsewhere, mirror them
into Artifact Registry first.
## Migrations
LiteLLM's proxy runs `prisma migrate deploy` at startup, but on first apply
the gateway/backend can race the empty database. Both stacks expose a
one-off migration task that runs `python litellm/proxy/prisma_migration.py`
against the backend image:
- AWS: an `aws_ecs_task_definition` (`litellm-migrations`). Run with
`aws ecs run-task` — the command is printed in `terraform output`.
- GCP: a `google_cloud_run_v2_job` (`litellm-migrations`). Run with
`gcloud run jobs execute` — the command is printed in `terraform output`.
Run the migration job once after the first `terraform apply` and before the
gateway/backend services start serving traffic.
## What's not included
- TLS certificates / custom domains. Both stacks expose plain-HTTP load
balancers; bring your own ACM cert (AWS) or managed cert (GCP) and wire
it into the LB resource.
- Remote state backends. Default local state — add an `s3` or `gcs`
backend block to `versions.tf` when graduating to a team environment.
- Observability beyond the cloud provider's defaults (CloudWatch logs on
AWS, Cloud Logging on GCP). Wire your own Prometheus / Datadog / Langfuse
via the `*_extra_env` variables.

View File

@ -0,0 +1,45 @@
# This file is maintained automatically by "terraform init".
# Manual edits may be lost in future updates.
provider "registry.terraform.io/hashicorp/aws" {
version = "5.100.0"
constraints = "~> 5.60"
hashes = [
"h1:Ijt7pOlB7Tr7maGQIqtsLFbl7pSMIj06TVdkoSBcYOw=",
"zh:054b8dd49f0549c9a7cc27d159e45327b7b65cf404da5e5a20da154b90b8a644",
"zh:0b97bf8d5e03d15d83cc40b0530a1f84b459354939ba6f135a0086c20ebbe6b2",
"zh:1589a2266af699cbd5d80737a0fe02e54ec9cf2ca54e7e00ac51c7359056f274",
"zh:6330766f1d85f01ae6ea90d1b214b8b74cc8c1badc4696b165b36ddd4cc15f7b",
"zh:7c8c2e30d8e55291b86fcb64bdf6c25489d538688545eb48fd74ad622e5d3862",
"zh:99b1003bd9bd32ee323544da897148f46a527f622dc3971af63ea3e251596342",
"zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425",
"zh:9f8b909d3ec50ade83c8062290378b1ec553edef6a447c56dadc01a99f4eaa93",
"zh:aaef921ff9aabaf8b1869a86d692ebd24fbd4e12c21205034bb679b9caf883a2",
"zh:ac882313207aba00dd5a76dbd572a0ddc818bb9cbf5c9d61b28fe30efaec951e",
"zh:bb64e8aff37becab373a1a0cc1080990785304141af42ed6aa3dd4913b000421",
"zh:dfe495f6621df5540d9c92ad40b8067376350b005c637ea6efac5dc15028add4",
"zh:f0ddf0eaf052766cfe09dea8200a946519f653c384ab4336e2a4a64fdd6310e9",
"zh:f1b7e684f4c7ae1eed272b6de7d2049bb87a0275cb04dbb7cda6636f600699c9",
"zh:ff461571e3f233699bf690db319dfe46aec75e58726636a0d97dd9ac6e32fb70",
]
}
provider "registry.terraform.io/hashicorp/random" {
version = "3.8.1"
constraints = "~> 3.6"
hashes = [
"h1:u8AKlWVDTH5r9YLSeswoVEjiY72Rt4/ch7U+61ZDkiQ=",
"zh:08dd03b918c7b55713026037c5400c48af5b9f468f483463321bd18e17b907b4",
"zh:0eee654a5542dc1d41920bbf2419032d6f0d5625b03bd81339e5b33394a3e0ae",
"zh:229665ddf060aa0ed315597908483eee5b818a17d09b6417a0f52fd9405c4f57",
"zh:2469d2e48f28076254a2a3fc327f184914566d9e40c5780b8d96ebf7205f8bc0",
"zh:37d7eb334d9561f335e748280f5535a384a88675af9a9eac439d4cfd663bcb66",
"zh:741101426a2f2c52dee37122f0f4a2f2d6af6d852cb1db634480a86398fa3511",
"zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
"zh:a902473f08ef8df62cfe6116bd6c157070a93f66622384300de235a533e9d4a9",
"zh:b85c511a23e57a2147355932b3b6dce2a11e856b941165793a0c3d7578d94d05",
"zh:c5172226d18eaac95b1daac80172287b69d4ce32750c82ad77fa0768be4ea4b8",
"zh:dab4434dba34aad569b0bc243c2d3f3ff86dd7740def373f2a49816bd2ff819b",
"zh:f49fd62aa8c5525a5c17abd51e27ca5e213881d58882fd42fec4a545b53c9699",
]
}

View File

@ -0,0 +1,254 @@
# LiteLLM on AWS (ECS Fargate)
Deploys the componentized LiteLLM proxy on AWS:
- **VPC** with public + private subnets across the AZs you pass in, one NAT gateway
- **Aurora Postgres** cluster — one writer instance + one reader instance, **IAM database authentication enabled**
- **ElastiCache Redis** (private, replication group with multi-AZ failover and at-rest + in-transit encryption) for caching + rate limiting
- **S3 bucket** (private, versioned, SSE-S3) — exposed to gateway + backend as `S3_BUCKET_NAME` / `S3_REGION_NAME` for cache backend, request log archival, and `/v1/files` storage
- **Secrets Manager** entries for `LITELLM_MASTER_KEY` (auto-generated, `sk-…`) and the Aurora master password (bootstrap-only)
- **ECS Fargate cluster** running three services — `gateway`, `backend`, `ui`
- **Application Load Balancer** (public, HTTP/80) with path-based routing:
- LLM data-plane prefixes (`/v1/chat/*`, `/v1/embeddings`, …) → `gateway`
- UI assets (`/`, `/_next/*`, `/litellm-asset-prefix/*`, …) → `ui`
- Everything else (management API: `/key/*`, `/user/*`, …) → `backend`
- **One-off migration task** (`litellm-migrations`) that runs `prisma migrate deploy` from the dedicated `ghcr.io/berriai/litellm-migrations` image
## Aurora + IAM auth
The cluster runs with `iam_database_authentication_enabled = true`. Enabling
that on the cluster doesn't by itself let any Postgres user log in with an IAM
token — you also need to `CREATE USER ... GRANT rds_iam` once. `bootstrap.tf`
does this automatically during `terraform apply` via a one-shot Fargate task
(`postgres:16-alpine` running the bootstrap SQL with the master password from
Secrets Manager). The SQL is idempotent, so re-applies are safe.
The same apply also runs the prisma schema migration via the existing
`litellm-migrations` task definition, and the gateway/backend services
`depends_on` the migration so they don't start until the schema is in place.
At runtime, the proxy assembles `DATABASE_URL` from `DATABASE_HOST/PORT/USER/NAME`
plus a short-lived IAM token — see `litellm/proxy/auth/rds_iam_token.py`. The
task role has `rds-db:connect` scoped to the IAM-authed user on the cluster.
**Break-glass.** If you need to run the bootstrap or migration by hand (e.g.,
to re-apply against an externally provisioned cluster), `db_bootstrap_sql` and
`migration_run_command` are still exposed as outputs.
**Prerequisite.** `terraform apply` shells out to `aws ecs run-task` /
`aws ecs wait` in `local-exec` provisioners, so the machine running terraform
needs the `aws` CLI installed and authenticated.
## Configuring the proxy
### `proxy_config` (preferred)
Mirrors the helm chart's `gateway.config.proxy_config`. The map is YAML-encoded
and base64-passed to gateway, backend, and the migration task; each container
decodes it to `/tmp/litellm-config.yaml` at startup and sets `CONFIG_FILE_PATH`
to match.
```hcl
proxy_config = {
model_list = [
{
model_name = "gpt-4o"
litellm_params = {
model = "openai/gpt-4o"
api_key = "os.environ/OPENAI_API_KEY"
}
},
]
general_settings = {
master_key = "os.environ/LITELLM_MASTER_KEY"
database_url = "os.environ/DATABASE_URL"
}
}
```
LiteLLM resolves `os.environ/<NAME>` references in the YAML against the
container's environment. That means provider API keys belong in
`*_extra_secrets` (next section), and your YAML just references them by name.
### Extra env vars
Non-sensitive plaintext (feature flags, observability hosts, etc.):
```hcl
gateway_extra_env = {
LANGFUSE_HOST = "https://us.cloud.langfuse.com"
}
backend_extra_env = {
STORE_MODEL_IN_DB = "True"
}
```
### Extra secrets (API keys)
Sensitive values — provider API keys, third-party tokens — live in **existing
Secrets Manager secrets**. Reference them by ARN:
```hcl
gateway_extra_secrets = {
OPENAI_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:openai-api-key-AbCdEf"
ANTHROPIC_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:anthropic-api-key-GhIjKl"
}
```
What happens under the hood:
- The execution role auto-gains `secretsmanager:GetSecretValue` on every ARN
listed here.
- ECS resolves each secret at task launch and injects its value into the
container as the env var named on the left.
- The `proxy_config` YAML references the resulting env var via
`os.environ/OPENAI_API_KEY`.
To pluck a single field out of a JSON secret, use ECS's `:fieldName::` suffix:
```hcl
gateway_extra_secrets = {
OPENAI_API_KEY = "arn:…:secret:provider-keys-AbCdEf:openai_api_key::"
}
```
To create the secret beforehand:
```bash
aws secretsmanager create-secret \
--name openai-api-key \
--secret-string "sk-proj-..."
```
## Tenant deployment
Every resource the stack creates is named `${tenant}-litellm-${env}` (or
that plus a per-resource suffix), so multiple tenants and multiple
environments coexist in the same account as long as the `(tenant, env)`
pair differs:
| `tenant` | `env` | Example resource name |
| -------- | ------- | ---------------------------------- |
| `acme` | `stage` | `acme-litellm-stage-gateway` |
| `acme` | `prod` | `acme-litellm-prod-master-key` |
| `globex` | `dev` | `globex-litellm-dev-license` |
For a per-tenant instance, the only inputs that change are the tenant
slug, env, and the two pre-issued secrets:
```bash
export TF_VAR_litellm_master_key="sk-..." # the tenant's master key
export TF_VAR_litellm_license="lic-..." # their LITELLM_LICENSE
terraform apply \
-var "region=us-west-2" \
-var 'azs=["us-west-2a","us-west-2b"]' \
-var "tenant=acme" \
-var "env=stage"
```
Both `litellm_master_key` and `litellm_license` are optional:
- Omit `litellm_master_key` → the stack auto-generates a random `sk-…`
value (trial/dev path).
- Omit `litellm_license` → no license secret is created and gateway/
backend run without `LITELLM_LICENSE` (OSS-only).
Use `TF_VAR_*` env vars rather than tfvars files for these — values
written to a tfvars file end up in `terraform.tfstate` and any committed
example files.
## Quick start
```bash
cd terraform/litellm/aws
cp terraform.tfvars.example terraform.tfvars
# Edit: region, tenant, env, azs, *_image, proxy_config, gateway_extra_secrets.
terraform init
terraform apply
```
That single apply provisions everything, runs the DB user bootstrap, runs the
schema migration, and only then starts the gateway/backend services. When it
returns, the stack is serving traffic.
```bash
terraform output alb_url
# UI login: admin / <master key>
aws secretsmanager get-secret-value \
--secret-id "$(terraform output -raw master_key_secret_arn)" \
--query SecretString --output text
```
## Image pulls
The defaults pull from `ghcr.io/berriai/litellm-<component>:v1.86.0-dev`,
which is anonymous-readable. There are four images: `litellm-gateway`,
`litellm-backend`, `litellm-ui`, and `litellm-migrations` (slim image used
only by the one-off migration task — runs `prisma migrate deploy` against
the writer DB and exits). Bump them together when bumping LiteLLM. To pull
from a private registry:
- **ECR (same account)**: the execution role already has
`AmazonECSTaskExecutionRolePolicy`, which grants ECR pull for repos in
the same account. No extra config needed.
- **ECR (cross-account)**: attach a policy to the execution role allowing
`ecr:GetAuthorizationToken` + `ecr:BatchGetImage` on the foreign repo
ARNs.
- **Other private registries** (GHCR with a PAT, Docker Hub, …): create a
secret holding `{"auths":{"<registry>":{"auth":"<base64-user:token>"}}}`
in Secrets Manager and set `repositoryCredentials.credentialsParameter`
on the task def container — extend `ecs.tf` accordingly.
## TLS
`terraform plan` refuses to provision an HTTP-only ALB by default — TLS
is the supported posture. Two paths:
**Production / staging — provide an ACM certificate:**
1. Create or import an ACM cert in `var.region` covering the DNS name you
plan to point at the ALB.
2. Set `acm_certificate_arn = "arn:aws:acm:..."` in tfvars and apply.
Result: a 443 listener carries the path-routing rules; the 80 listener
serves a permanent 301 redirect to HTTPS, so HTTP clients are
automatically upgraded.
**Trial / dev — explicitly opt into HTTP-only:**
Set `allow_plaintext_alb = true` in tfvars. Without this flag, plan fails
with a clear error pointing at the precondition. Intended for short-lived
trial / dev stacks only.
## Storage and database retention
Three opt-in tripwires guard against accidental data loss on
`terraform destroy`:
- **`skip_final_snapshot`** (Aurora; default `false`) — destroying the
cluster takes a `<cluster>-final-<short-sha>` snapshot first.
- **`s3_force_destroy`** (S3 bucket holding request log archives,
`/v1/files` content, and the S3 cache backend; default `false`) —
`terraform destroy` against a non-empty bucket fails.
Flip either to `true` only for ephemeral / CI stacks where you accept
losing the contents.
## Files
| File | What's in it |
| ----------------- | --------------------------------------------------------------------- |
| `versions.tf` | Terraform + provider version constraints |
| `providers.tf` | AWS provider (region + default tags) |
| `variables.tf` | All input variables |
| `locals.tf` | Path-prefix lists for ALB routing (mirror of `helm/.../ingress.yaml`) |
| `network.tf` | VPC, subnets, IGW, NAT, route tables, security groups |
| `secrets.tf` | Secrets Manager entries + random passwords |
| `rds.tf` | Aurora Postgres cluster + writer / reader instances |
| `redis.tf` | ElastiCache Redis |
| `s3.tf` | S3 bucket + task-role policy scoped to it |
| `iam.tf` | Task execution + task roles, including `rds-db:connect` |
| `ecs.tf` | ECS cluster, task definitions, services for the three components |
| `alb.tf` | ALB, listener, target groups, path-routing rules |
| `migrations.tf` | One-off migration task definition |
| `outputs.tf` | DNS name, secret ARN, bootstrap SQL, migration `run-task` command |

View File

@ -0,0 +1,179 @@
resource "aws_lb" "this" {
name = local.name
load_balancer_type = "application"
internal = false
security_groups = [aws_security_group.alb.id]
subnets = aws_subnet.public[*].id
idle_timeout = 120
}
locals {
# When an ACM cert ARN is provided we provision a 443 listener carrying
# the path-routing rules and downgrade the 80 listener to a redirect.
tls_enabled = var.acm_certificate_arn != ""
rules_listener_arn = local.tls_enabled ? aws_lb_listener.https[0].arn : aws_lb_listener.http.arn
}
# Target groups one per component. IP target type because Fargate tasks
# are addressed by ENI IP, not instance.
resource "aws_lb_target_group" "gateway" {
name = "${local.name}-gateway"
port = 4000
protocol = "HTTP"
target_type = "ip"
vpc_id = aws_vpc.this.id
health_check {
path = "/health/readiness"
matcher = "200-299"
interval = 30
timeout = 10
healthy_threshold = 2
unhealthy_threshold = 3
}
deregistration_delay = 30
}
resource "aws_lb_target_group" "backend" {
name = "${local.name}-backend"
port = 4001
protocol = "HTTP"
target_type = "ip"
vpc_id = aws_vpc.this.id
health_check {
path = "/health/readiness"
matcher = "200-299"
interval = 30
timeout = 10
healthy_threshold = 2
unhealthy_threshold = 3
}
deregistration_delay = 30
}
resource "aws_lb_target_group" "ui" {
name = "${local.name}-ui"
port = 3000
protocol = "HTTP"
target_type = "ip"
vpc_id = aws_vpc.this.id
health_check {
path = "/healthz"
matcher = "200-299"
interval = 30
timeout = 5
healthy_threshold = 2
unhealthy_threshold = 3
}
deregistration_delay = 30
}
# HTTP listener. When TLS is enabled this only serves a permanent
# 301 redirect to HTTPS; otherwise it carries the path-routing rules
# (default backend).
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.this.arn
port = 80
protocol = "HTTP"
default_action {
type = local.tls_enabled ? "redirect" : "forward"
dynamic "redirect" {
for_each = local.tls_enabled ? [1] : []
content {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
target_group_arn = local.tls_enabled ? null : aws_lb_target_group.backend.arn
}
# Default-deny on the HTTP-only path: TLS is the supported posture.
# Operators must either supply an ACM cert or explicitly opt in.
lifecycle {
precondition {
condition = local.tls_enabled || var.allow_plaintext_alb
error_message = "ALB has no HTTPS listener. Either set `acm_certificate_arn` to enable TLS, or set `allow_plaintext_alb = true` to opt into HTTP-only (trial / dev only)."
}
}
}
# HTTPS listener. Only created when an ACM cert ARN is supplied terminates
# TLS and carries the same default + path-routing rules.
resource "aws_lb_listener" "https" {
count = local.tls_enabled ? 1 : 0
load_balancer_arn = aws_lb.this.arn
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = var.acm_certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.backend.arn
}
}
# UI exact paths (/, /favicon.ico, /ui) priority 10.
resource "aws_lb_listener_rule" "ui_exact" {
listener_arn = local.rules_listener_arn
priority = 10
action {
type = "forward"
target_group_arn = aws_lb_target_group.ui.arn
}
condition {
path_pattern {
values = local.ui_exact_paths
}
}
}
# UI prefix paths (/_next/*, /litellm-asset-prefix/*, /assets/*, /ui/*) priority 20.
resource "aws_lb_listener_rule" "ui_prefix" {
listener_arn = local.rules_listener_arn
priority = 20
action {
type = "forward"
target_group_arn = aws_lb_target_group.ui.arn
}
condition {
path_pattern {
values = local.ui_path_prefixes
}
}
}
# Gateway prefix rules one per chunk-of-5 because ALB caps a path-pattern
# condition at 5 values. Priorities 100..(100 + N).
resource "aws_lb_listener_rule" "gateway" {
for_each = { for idx, chunk in local.gateway_path_chunks : idx => chunk }
listener_arn = local.rules_listener_arn
priority = 100 + tonumber(each.key)
action {
type = "forward"
target_group_arn = aws_lb_target_group.gateway.arn
}
condition {
path_pattern {
values = each.value
}
}
}

View File

@ -0,0 +1,105 @@
# Application Auto Scaling for the three ECS services. Mirrors the HPA values
# baked into the helm chart at helm/litellm/values.yaml:
#
# gateway: 1-10 replicas, target 70% CPU + 80% memory
# backend: 1-4 replicas, target 70% CPU
# ui: 1-3 replicas, target 80% CPU (off by default; nginx static export)
#
# Each service gets a scalable target plus one target-tracking policy per metric.
# When autoscaling is disabled (count=0) the resources collapse cleanly out of
# the plan; the service's desired_count from ecs.tf stays in effect.
# ---------- Gateway ----------
resource "aws_appautoscaling_target" "gateway" {
count = var.gateway_autoscaling_enabled ? 1 : 0
service_namespace = "ecs"
resource_id = "service/${aws_ecs_cluster.this.name}/${aws_ecs_service.gateway.name}"
scalable_dimension = "ecs:service:DesiredCount"
min_capacity = var.gateway_min_capacity
max_capacity = var.gateway_max_capacity
}
resource "aws_appautoscaling_policy" "gateway_cpu" {
count = var.gateway_autoscaling_enabled ? 1 : 0
name = "${local.name}-gateway-cpu"
policy_type = "TargetTrackingScaling"
service_namespace = aws_appautoscaling_target.gateway[0].service_namespace
resource_id = aws_appautoscaling_target.gateway[0].resource_id
scalable_dimension = aws_appautoscaling_target.gateway[0].scalable_dimension
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = var.gateway_cpu_target
}
}
resource "aws_appautoscaling_policy" "gateway_memory" {
# Memory policy is optional; set gateway_memory_target = 0 to omit it.
count = var.gateway_autoscaling_enabled && var.gateway_memory_target > 0 ? 1 : 0
name = "${local.name}-gateway-memory"
policy_type = "TargetTrackingScaling"
service_namespace = aws_appautoscaling_target.gateway[0].service_namespace
resource_id = aws_appautoscaling_target.gateway[0].resource_id
scalable_dimension = aws_appautoscaling_target.gateway[0].scalable_dimension
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageMemoryUtilization"
}
target_value = var.gateway_memory_target
}
}
# ---------- Backend ----------
resource "aws_appautoscaling_target" "backend" {
count = var.backend_autoscaling_enabled ? 1 : 0
service_namespace = "ecs"
resource_id = "service/${aws_ecs_cluster.this.name}/${aws_ecs_service.backend.name}"
scalable_dimension = "ecs:service:DesiredCount"
min_capacity = var.backend_min_capacity
max_capacity = var.backend_max_capacity
}
resource "aws_appautoscaling_policy" "backend_cpu" {
count = var.backend_autoscaling_enabled ? 1 : 0
name = "${local.name}-backend-cpu"
policy_type = "TargetTrackingScaling"
service_namespace = aws_appautoscaling_target.backend[0].service_namespace
resource_id = aws_appautoscaling_target.backend[0].resource_id
scalable_dimension = aws_appautoscaling_target.backend[0].scalable_dimension
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = var.backend_cpu_target
}
}
# ---------- UI ----------
resource "aws_appautoscaling_target" "ui" {
count = var.ui_autoscaling_enabled ? 1 : 0
service_namespace = "ecs"
resource_id = "service/${aws_ecs_cluster.this.name}/${aws_ecs_service.ui.name}"
scalable_dimension = "ecs:service:DesiredCount"
min_capacity = var.ui_min_capacity
max_capacity = var.ui_max_capacity
}
resource "aws_appautoscaling_policy" "ui_cpu" {
count = var.ui_autoscaling_enabled ? 1 : 0
name = "${local.name}-ui-cpu"
policy_type = "TargetTrackingScaling"
service_namespace = aws_appautoscaling_target.ui[0].service_namespace
resource_id = aws_appautoscaling_target.ui[0].resource_id
scalable_dimension = aws_appautoscaling_target.ui[0].scalable_dimension
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = var.ui_cpu_target
}
}

View File

@ -0,0 +1,185 @@
# Auto-runs the two manual steps that used to follow `terraform apply`:
#
# 1. Create the IAM-authed Postgres user (litellm_app) uses the postgres:16
# image with the master password from Secrets Manager.
# 2. Run prisma migrate deploy reuses the existing aws_ecs_task_definition
# .migrations task def from migrations.tf.
#
# Both are invoked via `terraform_data` provisioners. Gateway/backend services
# in ecs.tf depend on `terraform_data.migration`, so on a fresh apply they
# don't start until the schema is in place no crash-loop window.
#
# Triggers:
# - bootstrap_db re-runs if the Aurora cluster is recreated, or if the
# bootstrap task definition (image/SQL) changes.
# - migration re-runs if the migration task def revision changes (e.g., new
# backend image with new prisma migration files) or if bootstrap re-ran.
#
# Requires `aws` CLI on the machine running terraform. For laptop usage that's
# fine; for CI/CD the runner image needs `aws`.
# ---------- IAM ----------
# Execution role can already read the runtime secrets (master_key, user-provided
# extras see iam.tf). The DB master password lives in a separate secret used
# only here, so we grant access in an additive policy.
resource "aws_iam_policy" "bootstrap_secrets" {
name = "${local.name}-bootstrap-secrets-access"
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["secretsmanager:GetSecretValue"]
Resource = [aws_secretsmanager_secret.db_master_password.arn]
}]
})
}
resource "aws_iam_role_policy_attachment" "task_execution_bootstrap_secrets" {
role = aws_iam_role.task_execution.name
policy_arn = aws_iam_policy.bootstrap_secrets.arn
}
# ---------- Bootstrap task def ----------
resource "aws_cloudwatch_log_group" "bootstrap_db" {
name = "/ecs/${local.name}/bootstrap-db"
retention_in_days = var.log_retention_days
}
locals {
# Idempotent: CREATE USER is wrapped in DO/EXCEPTION; GRANTs are
# idempotent by definition (re-granting is a no-op). Safe to re-run on
# any subsequent apply.
bootstrap_sql = <<-SQL
DO $$
BEGIN
CREATE USER ${var.db_username};
EXCEPTION WHEN duplicate_object THEN NULL;
END $$;
GRANT rds_iam TO ${var.db_username};
GRANT ALL PRIVILEGES ON DATABASE ${var.db_name} TO ${var.db_username};
GRANT ALL ON SCHEMA public TO ${var.db_username};
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO ${var.db_username};
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO ${var.db_username};
SQL
}
resource "aws_ecs_task_definition" "bootstrap_db" {
family = "${local.name}-bootstrap-db"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = 256
memory = 512
execution_role_arn = aws_iam_role.task_execution.arn
task_role_arn = aws_iam_role.task.arn
container_definitions = jsonencode([{
name = "psql"
image = "postgres:16-alpine"
essential = true
environment = [
{ name = "PGHOST", value = aws_rds_cluster.this.endpoint },
{ name = "PGPORT", value = tostring(aws_rds_cluster.this.port) },
{ name = "PGUSER", value = var.db_master_username },
{ name = "PGDATABASE", value = var.db_name },
{ name = "BOOTSTRAP_SQL", value = local.bootstrap_sql },
]
secrets = [
# `:password::` extracts the password field out of the JSON secret.
{ name = "PGPASSWORD", valueFrom = "${aws_secretsmanager_secret.db_master_password.arn}:password::" },
]
entryPoint = ["sh", "-c"]
command = ["echo \"$BOOTSTRAP_SQL\" | psql -v ON_ERROR_STOP=1"]
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.bootstrap_db.name
awslogs-region = var.region
awslogs-stream-prefix = "bootstrap"
}
}
}])
}
# ---------- Bootstrap trigger ----------
resource "terraform_data" "bootstrap_db" {
triggers_replace = {
cluster_resource_id = aws_rds_cluster.this.cluster_resource_id
task_def_revision = aws_ecs_task_definition.bootstrap_db.revision
}
provisioner "local-exec" {
interpreter = ["bash", "-c"]
environment = {
CLUSTER = aws_ecs_cluster.this.name
TASK_DEF = aws_ecs_task_definition.bootstrap_db.arn
SUBNETS = join(",", aws_subnet.private[*].id)
SG = aws_security_group.tasks.id
REGION = var.region
LOG_GRP = aws_cloudwatch_log_group.bootstrap_db.name
}
command = <<-EOT
set -euo pipefail
task_arn=$(aws ecs run-task --region "$REGION" --cluster "$CLUSTER" \
--launch-type FARGATE --task-definition "$TASK_DEF" \
--network-configuration "awsvpcConfiguration={subnets=[$SUBNETS],securityGroups=[$SG],assignPublicIp=DISABLED}" \
--query 'tasks[0].taskArn' --output text)
echo "bootstrap task: $task_arn"
aws ecs wait tasks-stopped --region "$REGION" --cluster "$CLUSTER" --tasks "$task_arn"
task_id=$(echo "$task_arn" | awk -F/ '{print $NF}')
exit_code=$(aws ecs describe-tasks --region "$REGION" --cluster "$CLUSTER" --tasks "$task_id" \
--query 'tasks[0].containers[0].exitCode' --output text)
if [ "$exit_code" != "0" ]; then
echo "Bootstrap failed (exit=$exit_code). Logs: $LOG_GRP" >&2
exit 1
fi
EOT
}
depends_on = [
aws_rds_cluster_instance.writer,
aws_iam_role_policy_attachment.task_execution_bootstrap_secrets,
]
}
# ---------- Migration trigger ----------
# Reuses the task definition from migrations.tf this resource just invokes
# it and waits.
resource "terraform_data" "migration" {
triggers_replace = {
task_def_revision = aws_ecs_task_definition.migrations.revision
bootstrap_id = terraform_data.bootstrap_db.id
}
provisioner "local-exec" {
interpreter = ["bash", "-c"]
environment = {
CLUSTER = aws_ecs_cluster.this.name
TASK_DEF = aws_ecs_task_definition.migrations.arn
SUBNETS = join(",", aws_subnet.private[*].id)
SG = aws_security_group.tasks.id
REGION = var.region
LOG_GRP = aws_cloudwatch_log_group.migrations.name
}
command = <<-EOT
set -euo pipefail
task_arn=$(aws ecs run-task --region "$REGION" --cluster "$CLUSTER" \
--launch-type FARGATE --task-definition "$TASK_DEF" \
--network-configuration "awsvpcConfiguration={subnets=[$SUBNETS],securityGroups=[$SG],assignPublicIp=DISABLED}" \
--query 'tasks[0].taskArn' --output text)
echo "migration task: $task_arn"
aws ecs wait tasks-stopped --region "$REGION" --cluster "$CLUSTER" --tasks "$task_arn"
task_id=$(echo "$task_arn" | awk -F/ '{print $NF}')
exit_code=$(aws ecs describe-tasks --region "$REGION" --cluster "$CLUSTER" --tasks "$task_id" \
--query 'tasks[0].containers[0].exitCode' --output text)
if [ "$exit_code" != "0" ]; then
echo "Migration failed (exit=$exit_code). Logs: $LOG_GRP" >&2
exit 1
fi
EOT
}
depends_on = [terraform_data.bootstrap_db]
}

View File

@ -0,0 +1,347 @@
resource "aws_ecs_cluster" "this" {
name = local.name
setting {
name = "containerInsights"
value = "enabled"
}
}
resource "aws_cloudwatch_log_group" "gateway" {
name = "/ecs/${local.name}/gateway"
retention_in_days = var.log_retention_days
}
resource "aws_cloudwatch_log_group" "backend" {
name = "/ecs/${local.name}/backend"
retention_in_days = var.log_retention_days
}
resource "aws_cloudwatch_log_group" "ui" {
name = "/ecs/${local.name}/ui"
retention_in_days = var.log_retention_days
}
resource "aws_cloudwatch_log_group" "migrations" {
name = "/ecs/${local.name}/migrations"
retention_in_days = var.log_retention_days
}
# Shared env block fed to gateway, backend, and the migration task. Mirrors
# the helm chart's `litellm.serverEnv` helper on the IAM-auth branch:
# DATABASE_URL is assembled at runtime by
# litellm/proxy/auth/rds_iam_token.py::init_iam_db_url_from_env from
# HOST/PORT/USER/NAME plus an IAM-signed token, so no DB password is needed
# in the task definition.
locals {
shared_env = [
{ name = "IAM_TOKEN_DB_AUTH", value = "true" },
{ name = "DATABASE_HOST", value = aws_rds_cluster.this.endpoint },
{ name = "DATABASE_PORT", value = tostring(aws_rds_cluster.this.port) },
{ name = "DATABASE_USER", value = var.db_username },
{ name = "DATABASE_NAME", value = var.db_name },
{ name = "DATABASE_HOST_READ_REPLICA", value = aws_rds_cluster.this.reader_endpoint },
{ name = "DATABASE_PORT_READ_REPLICA", value = tostring(aws_rds_cluster.this.port) },
{ name = "REDIS_HOST", value = aws_elasticache_replication_group.this.primary_endpoint_address },
{ name = "REDIS_PORT", value = tostring(aws_elasticache_replication_group.this.port) },
# transit_encryption_enabled = true on the replication group means the
# proxy must connect via rediss://. _redis.get_redis_url_from_environment
# honors REDIS_SSL to flip the scheme.
{ name = "REDIS_SSL", value = "true" },
# S3 bucket referenced from proxy_config via os.environ/S3_BUCKET_NAME
# (e.g. cache backend, request log archival, /files passthrough).
{ name = "S3_BUCKET_NAME", value = aws_s3_bucket.this.bucket },
{ name = "S3_REGION_NAME", value = var.region },
# boto3 inside generate_iam_auth_token reads AWS_REGION_NAME first, then
# AWS_REGION. Set both for compatibility.
{ name = "AWS_REGION", value = var.region },
{ name = "AWS_REGION_NAME", value = var.region },
]
shared_secrets = concat(
[
{ name = "LITELLM_MASTER_KEY", valueFrom = aws_secretsmanager_secret.master_key.arn },
],
var.litellm_license == "" ? [] : [
{ name = "LITELLM_LICENSE", valueFrom = aws_secretsmanager_secret.license[0].arn },
],
)
# Backend-only managed secrets. UI_PASSWORD is consumed by the management
# API (UI login flow) and has no use on the gateway data plane.
backend_managed_secrets = var.ui_password == "" ? [] : [
{ name = "UI_PASSWORD", valueFrom = aws_secretsmanager_secret.ui_password[0].arn },
]
gateway_extra_env_list = [
for k, v in var.gateway_extra_env : { name = k, value = v }
]
backend_extra_env_list = [
for k, v in var.backend_extra_env : { name = k, value = v }
]
backend_default_env = [
{ name = "STORE_MODEL_IN_DB", value = "true" },
]
gateway_extra_secrets_list = [
for k, v in var.gateway_extra_secrets : { name = k, valueFrom = v }
]
backend_extra_secrets_list = [
for k, v in var.backend_extra_secrets : { name = k, valueFrom = v }
]
# Mirrors the helm chart's gateway.config.create / configmap pattern.
# ECS Fargate has no ConfigMap analogue, so we pass the YAML as a
# base64-encoded env var and decode it at container start via a tiny
# python shim that prepends the image's normal uvicorn entrypoint.
proxy_config_enabled = length(keys(var.proxy_config)) > 0
proxy_config_b64 = local.proxy_config_enabled ? base64encode(yamlencode(var.proxy_config)) : ""
proxy_config_env = local.proxy_config_enabled ? [
{ name = "LITELLM_PROXY_CONFIG_B64", value = local.proxy_config_b64 },
{ name = "CONFIG_FILE_PATH", value = "/tmp/litellm-config.yaml" },
] : []
# Gateway always needs --workers wired in (no NUM_WORKERS env var support
# in the image entrypoint). When proxy_config is enabled we also have to
# decode the base64 config first, so the command goes through `sh -c`;
# otherwise we keep the image's ENTRYPOINT and only override `command`.
gateway_uvicorn_args = "--host 0.0.0.0 --port 4000 --workers ${var.gateway_num_workers}"
backend_uvicorn_args = "--host 0.0.0.0 --port 4001"
gateway_proxy_overrides = local.proxy_config_enabled ? {
entryPoint = ["sh", "-c"]
command = [
"python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\" && exec uvicorn gateway.main:app ${local.gateway_uvicorn_args}"
]
} : {
# Mirror the image's ENTRYPOINT so we can append --workers via command.
entryPoint = ["uvicorn", "gateway.main:app"]
command = split(" ", local.gateway_uvicorn_args)
}
backend_proxy_overrides = local.proxy_config_enabled ? {
entryPoint = ["sh", "-c"]
command = [
"python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\" && exec uvicorn backend.main:app ${local.backend_uvicorn_args}"
]
} : {}
}
# ---------- Gateway ----------
resource "aws_ecs_task_definition" "gateway" {
family = "${local.name}-gateway"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = var.gateway_cpu
memory = var.gateway_memory
execution_role_arn = aws_iam_role.task_execution.arn
task_role_arn = aws_iam_role.task.arn
container_definitions = jsonencode([
merge(
{
name = "gateway"
image = var.gateway_image
essential = true
portMappings = [{ containerPort = 4000, protocol = "tcp" }]
environment = concat(
local.shared_env,
local.gateway_extra_env_list,
local.proxy_config_env,
)
secrets = concat(local.shared_secrets, local.gateway_extra_secrets_list)
# Container-level healthCheck intentionally omitted the wolfi
# runtime image doesn't ship curl/wget. The ALB target group polls
# /health/readiness.
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.gateway.name
awslogs-region = var.region
awslogs-stream-prefix = "gateway"
}
}
},
local.gateway_proxy_overrides,
)
])
}
resource "aws_ecs_service" "gateway" {
name = "${local.name}-gateway"
cluster = aws_ecs_cluster.this.id
task_definition = aws_ecs_task_definition.gateway.arn
desired_count = var.gateway_desired_count
launch_type = "FARGATE"
network_configuration {
subnets = aws_subnet.private[*].id
security_groups = [aws_security_group.tasks.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.gateway.arn
container_name = "gateway"
container_port = 4000
}
deployment_minimum_healthy_percent = 50
deployment_maximum_percent = 200
# desired_count is owned by Application Auto Scaling once enabled (autoscaling.tf).
# Terraform sets the initial value from var.gateway_desired_count, then steps aside.
lifecycle {
ignore_changes = [desired_count]
}
# Don't start until the schema migration has run. Otherwise the proxy
# boots, Prisma fails on the missing tables, and ECS thrashes the task.
depends_on = [
aws_lb_listener.http,
aws_lb_listener.https,
terraform_data.migration,
]
}
# ---------- Backend ----------
resource "aws_ecs_task_definition" "backend" {
family = "${local.name}-backend"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = var.backend_cpu
memory = var.backend_memory
execution_role_arn = aws_iam_role.task_execution.arn
task_role_arn = aws_iam_role.task.arn
container_definitions = jsonencode([
merge(
{
name = "backend"
image = var.backend_image
essential = true
portMappings = [{ containerPort = 4001, protocol = "tcp" }]
environment = concat(
local.shared_env,
local.backend_default_env,
local.backend_extra_env_list,
local.proxy_config_env,
)
secrets = concat(local.shared_secrets, local.backend_managed_secrets, local.backend_extra_secrets_list)
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.backend.name
awslogs-region = var.region
awslogs-stream-prefix = "backend"
}
}
},
local.backend_proxy_overrides,
)
])
}
resource "aws_ecs_service" "backend" {
name = "${local.name}-backend"
cluster = aws_ecs_cluster.this.id
task_definition = aws_ecs_task_definition.backend.arn
desired_count = var.backend_desired_count
launch_type = "FARGATE"
network_configuration {
subnets = aws_subnet.private[*].id
security_groups = [aws_security_group.tasks.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.backend.arn
container_name = "backend"
container_port = 4001
}
deployment_minimum_healthy_percent = 50
deployment_maximum_percent = 200
lifecycle {
ignore_changes = [desired_count]
}
depends_on = [
aws_lb_listener.http,
aws_lb_listener.https,
terraform_data.migration,
]
}
# ---------- UI ----------
# task_role is deliberately the unprivileged ui_task the UI has no DB,
# S3, or Secrets Manager dependency, and inheriting the shared `task`
# role would expose every data-plane secret to a compromised UI
# container via the task metadata endpoint.
resource "aws_ecs_task_definition" "ui" {
family = "${local.name}-ui"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = var.ui_cpu
memory = var.ui_memory
execution_role_arn = aws_iam_role.task_execution.arn
task_role_arn = aws_iam_role.ui_task.arn
container_definitions = jsonencode([
{
name = "ui"
image = var.ui_image
essential = true
portMappings = [{ containerPort = 3000, protocol = "tcp" }]
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.ui.name
awslogs-region = var.region
awslogs-stream-prefix = "ui"
}
}
}
])
}
resource "aws_ecs_service" "ui" {
name = "${local.name}-ui"
cluster = aws_ecs_cluster.this.id
task_definition = aws_ecs_task_definition.ui.arn
desired_count = var.ui_desired_count
launch_type = "FARGATE"
network_configuration {
subnets = aws_subnet.private[*].id
security_groups = [aws_security_group.tasks.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.ui.arn
container_name = "ui"
container_port = 3000
}
deployment_minimum_healthy_percent = 50
deployment_maximum_percent = 200
lifecycle {
ignore_changes = [desired_count]
}
depends_on = [
aws_lb_listener.http,
aws_lb_listener.https,
]
}

View File

@ -0,0 +1,114 @@
# ECS task execution role used by the agent to pull images, write logs,
# and resolve secrets at task start.
data "aws_iam_policy_document" "task_assume" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
resource "aws_iam_role" "task_execution" {
name = "${local.name}-task-execution"
assume_role_policy = data.aws_iam_policy_document.task_assume.json
}
resource "aws_iam_role_policy_attachment" "task_execution" {
role = aws_iam_role.task_execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
# User-provided extra secrets may be passed as the bare secret ARN
# ("arn:aws:secretsmanager:...:secret:name-AbCdEf") or the JSON-key form
# ECS supports fully spelled out as
# "arn:...:secret:name-AbCdEf:jsonKey:versionStage:versionId" with any of
# the trailing parts blank ("...:jsonKey::" being the most common). The IAM
# policy resource must always be the bare ARN, so we split on ':' and keep
# the first 7 components robust to any combination of empty/non-empty
# version-stage/version-id suffixes that a regex would otherwise have to
# enumerate.
locals {
extra_secret_value_froms = concat(
values(var.gateway_extra_secrets),
values(var.backend_extra_secrets),
)
extra_secret_arns = distinct([
for v in local.extra_secret_value_froms :
join(":", slice(split(":", v), 0, 7))
])
}
# Execution role can read the managed secrets + any caller-provided extras
# so ECS can resolve them when launching tasks. Image pulls inherit the
# managed AmazonECSTaskExecutionRolePolicy.
data "aws_iam_policy_document" "secrets_access" {
statement {
actions = ["secretsmanager:GetSecretValue"]
resources = concat(
[aws_secretsmanager_secret.master_key.arn],
aws_secretsmanager_secret.license[*].arn,
aws_secretsmanager_secret.ui_password[*].arn,
local.extra_secret_arns,
)
}
}
resource "aws_iam_policy" "secrets_access" {
name = "${local.name}-secrets-access"
policy = data.aws_iam_policy_document.secrets_access.json
}
resource "aws_iam_role_policy_attachment" "task_execution_secrets" {
role = aws_iam_role.task_execution.name
policy_arn = aws_iam_policy.secrets_access.arn
}
# ---------- Task role ----------
#
# Assumed by the running container. Gets `rds-db:connect` so the proxy can
# mint IAM-signed Postgres tokens for the app user. Layer additional
# policies here (e.g. Bedrock invoke, S3 read) when the proxy needs them.
resource "aws_iam_role" "task" {
name = "${local.name}-task"
assume_role_policy = data.aws_iam_policy_document.task_assume.json
}
data "aws_caller_identity" "current" {}
data "aws_iam_policy_document" "rds_iam_connect" {
statement {
actions = ["rds-db:connect"]
resources = [
"arn:aws:rds-db:${var.region}:${data.aws_caller_identity.current.account_id}:dbuser:${aws_rds_cluster.this.cluster_resource_id}/${var.db_username}",
]
}
}
resource "aws_iam_policy" "rds_iam_connect" {
name = "${local.name}-rds-iam-connect"
policy = data.aws_iam_policy_document.rds_iam_connect.json
}
resource "aws_iam_role_policy_attachment" "task_rds_iam_connect" {
role = aws_iam_role.task.name
policy_arn = aws_iam_policy.rds_iam_connect.arn
}
# ---------- UI task role ----------
#
# The UI is static nginx with no DB, S3, or Secrets Manager dependencies,
# so it deliberately does NOT inherit the shared `task` role's
# rds-db:connect / S3 / extra-secrets policies. Empty policy set the
# only thing exposed via the task metadata endpoint is an identity that
# can't reach any LiteLLM data-plane resource. The shared
# `task_execution` role still pulls the image and writes logs (its
# credentials aren't surfaced to the container).
resource "aws_iam_role" "ui_task" {
name = "${local.name}-ui-task"
assume_role_policy = data.aws_iam_policy_document.task_assume.json
}

View File

@ -0,0 +1,75 @@
# Gateway path prefixes mirrored verbatim from gateway/routes/allowlist.py
# and the helm ingress in helm/litellm/templates/ingress.yaml. Anything not in
# this list and not a UI asset path falls through to the backend (management
# API) catch-all rule on the ALB.
#
# ALB listener rules cap path-pattern conditions at 5 values per rule, so we
# chunk this list and emit one rule per chunk.
locals {
# Every resource the stack creates is named `<tenant>-litellm-<env>`
# (or that with a per-resource suffix). Computed once here so the rest of
# the stack can reference local.name.
name = "${var.tenant}-litellm-${var.env}"
gateway_path_prefixes = [
"/v1/chat/*", "/chat/*",
"/v1/completions*", "/completions*",
"/v1/embeddings*", "/embeddings*",
"/v1/moderations*", "/moderations*",
"/v1/audio/*", "/audio/*",
"/v1/images/*", "/images/*",
"/v1/files*", "/files*",
"/v1/batches*", "/batches*",
"/v1/fine_tuning/*", "/fine_tuning/*",
"/v1/fine-tuning/*", "/fine-tuning/*",
"/v1/responses*", "/responses*",
"/v1/threads*", "/threads*",
"/v1/assistants*", "/assistants*",
"/v1/vector_stores*", "/vector_stores*",
"/v1/indexes*",
"/v1/models*", "/models*",
"/openai/*", "/engines/*",
"/v1/messages*", "/messages*",
"/v1/skills/*", "/v1/a2a/*",
"/v1/rerank*", "/v2/rerank*", "/rerank*",
"/v1/ocr*", "/ocr*",
"/v1/rag/*", "/rag/*",
"/v1/video/*", "/v1/videos/*", "/video/*", "/videos/*",
"/v1/search*", "/search*",
"/v1/containers/*", "/containers/*",
"/v1/evals/*",
"/v1/memory/*",
"/queue/chat/*",
"/v1beta/*",
"/interactions/*",
"/anthropic/*", "/azure/*", "/azure_ai/*", "/aws/*", "/bedrock/*",
"/cohere/*", "/gemini/*", "/google/*",
"/vertex_ai/*", "/vertex-ai/*",
"/assemblyai/*", "/eu.assemblyai/*",
"/langfuse/*", "/vllm/*",
"/mistral/*", "/groq/*", "/voyage/*", "/cursor/*", "/milvus/*",
"/openai_passthrough/*",
"/toolset/*",
"/v1/realtime*", "/realtime*",
"/health*", "/metrics", "/test*",
]
# Static UI asset prefixes handled by the UI service, not the backend
# catch-all. /favicon.ico and / are also UI but added as exact rules.
ui_path_prefixes = [
"/litellm-asset-prefix/*",
"/_next/*",
"/assets/*",
"/ui/*",
]
ui_exact_paths = [
"/",
"/favicon.ico",
"/ui",
]
# ALB rules accept 5 path-pattern values per condition. Chunk the prefix
# list so each chunk becomes one rule.
gateway_path_chunks = chunklist(local.gateway_path_prefixes, 5)
}

View File

@ -0,0 +1,45 @@
# Task definition for the dedicated litellm-migrations image. Mirrors the
# pre-install/pre-upgrade Helm hook in helm/litellm/templates/migrations-job.yaml.
#
# The image (built from migrations/Dockerfile) ships with
# `ENTRYPOINT ["python3", "/app/run.py"]`. run.py assembles DATABASE_URL from
# the discrete DATABASE_* env vars (IAM auth here) via DatabaseURLSettings,
# then calls ProxyExtrasDBManager.setup_database() i.e. `prisma migrate
# deploy` with the v2 resolver and P3005/P3009/P3018 recovery. It does NOT
# read CONFIG_FILE_PATH, the master key, or DISABLE_SCHEMA_UPDATE, so we
# don't pass them.
#
# Invoked automatically by `terraform_data.migration` in bootstrap.tf during
# every apply (after the IAM-authed user has been created). The
# `migration_run_command` output is preserved for break-glass manual re-runs.
resource "aws_ecs_task_definition" "migrations" {
family = "${local.name}-migrations"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
# Prisma's Node + Rust engine plus the v2 migration resolver routinely
# peaks well above 1 GiB while applying the schema. 4 GiB gives plenty
# of headroom; CPU stays low because `prisma migrate deploy` is
# single-threaded.
cpu = 512
memory = 4096
execution_role_arn = aws_iam_role.task_execution.arn
task_role_arn = aws_iam_role.task.arn
container_definitions = jsonencode([{
name = "migrations"
image = var.migrations_image
essential = true
# No entryPoint/command override the image's ENTRYPOINT runs run.py.
environment = local.shared_env
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.migrations.name
awslogs-region = var.region
awslogs-stream-prefix = "migrations"
}
}
}])
}

View File

@ -0,0 +1,172 @@
data "aws_availability_zones" "available" {
state = "available"
}
resource "aws_vpc" "this" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = { Name = local.name }
}
resource "aws_internet_gateway" "this" {
vpc_id = aws_vpc.this.id
tags = { Name = local.name }
}
# Public subnets (ALB + NAT). One per AZ.
resource "aws_subnet" "public" {
count = length(var.azs)
vpc_id = aws_vpc.this.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.azs[count.index]
map_public_ip_on_launch = true
tags = { Name = "${local.name}-public-${var.azs[count.index]}" }
}
# Private subnets (ECS tasks, RDS, ElastiCache). One per AZ, separate from
# public range.
resource "aws_subnet" "private" {
count = length(var.azs)
vpc_id = aws_vpc.this.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.azs[count.index]
tags = { Name = "${local.name}-private-${var.azs[count.index]}" }
}
resource "aws_eip" "nat" {
domain = "vpc"
tags = { Name = "${local.name}-nat" }
depends_on = [aws_internet_gateway.this]
}
# Single NAT gateway in the first public subnet. For HA, replicate per AZ
# adds ~$30/mo per gateway, so off by default for a baseline deployment.
resource "aws_nat_gateway" "this" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public[0].id
tags = { Name = local.name }
depends_on = [aws_internet_gateway.this]
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.this.id
}
tags = { Name = "${local.name}-public" }
}
resource "aws_route_table_association" "public" {
count = length(var.azs)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table" "private" {
vpc_id = aws_vpc.this.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.this.id
}
tags = { Name = "${local.name}-private" }
}
resource "aws_route_table_association" "private" {
count = length(var.azs)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private.id
}
# ---------- Security groups ----------
resource "aws_security_group" "alb" {
name = "${local.name}-alb"
description = "Inbound HTTP/HTTPS to the LiteLLM ALB."
vpc_id = aws_vpc.this.id
ingress {
description = "HTTP from anywhere"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS from anywhere"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "All egress"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "tasks" {
name = "${local.name}-tasks"
description = "ECS tasks (gateway/backend/ui)."
vpc_id = aws_vpc.this.id
ingress {
description = "ALB to tasks"
from_port = 0
to_port = 65535
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
description = "All egress (LLM providers, RDS, Redis)"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "rds" {
name = "${local.name}-rds"
description = "RDS Postgres - tasks only."
vpc_id = aws_vpc.this.id
ingress {
description = "Postgres from ECS tasks"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.tasks.id]
}
}
resource "aws_security_group" "redis" {
name = "${local.name}-redis"
description = "ElastiCache Redis - tasks only."
vpc_id = aws_vpc.this.id
ingress {
description = "Redis from ECS tasks"
from_port = 6379
to_port = 6379
protocol = "tcp"
security_groups = [aws_security_group.tasks.id]
}
}

View File

@ -0,0 +1,72 @@
output "alb_dns_name" {
description = "Public DNS name of the LiteLLM ALB."
value = aws_lb.this.dns_name
}
output "alb_url" {
description = "Proxy URL. Switches scheme based on whether acm_certificate_arn is set; the underlying DNS name is the ALB. The dashboard is served at /, the API at /v1/*."
value = "${local.tls_enabled ? "https" : "http"}://${aws_lb.this.dns_name}"
}
output "ecs_cluster" {
description = "ECS cluster name."
value = aws_ecs_cluster.this.name
}
output "aurora_writer_endpoint" {
description = "Aurora writer endpoint (cluster endpoint). Used by gateway/backend as DATABASE_HOST."
value = aws_rds_cluster.this.endpoint
}
output "aurora_reader_endpoint" {
description = "Aurora reader endpoint. Used by gateway/backend as DATABASE_HOST_READ_REPLICA."
value = aws_rds_cluster.this.reader_endpoint
}
output "redis_endpoint" {
description = "ElastiCache Redis primary endpoint (TLS, transit_encryption_enabled = true)."
value = "${aws_elasticache_replication_group.this.primary_endpoint_address}:${aws_elasticache_replication_group.this.port}"
}
output "s3_bucket" {
description = "S3 bucket name. Exposed to gateway + backend as S3_BUCKET_NAME / S3_REGION_NAME. Reference from proxy_config via `os.environ/S3_BUCKET_NAME`."
value = aws_s3_bucket.this.bucket
}
output "master_key_secret_arn" {
description = "Secrets Manager ARN holding LITELLM_MASTER_KEY. Fetch with `aws secretsmanager get-secret-value --secret-id <arn>`."
value = aws_secretsmanager_secret.master_key.arn
}
output "db_master_password_secret_arn" {
description = "Secrets Manager ARN holding the Aurora master credentials (bootstrap-only). Used to create the IAM-authed application user."
value = aws_secretsmanager_secret.db_master_password.arn
}
# Pre-baked SQL to run once as the master user, creating the IAM-authed
# application user that gateway/backend/migration tasks will authenticate as.
output "db_bootstrap_sql" {
description = "Run this once as the master DB user (after the first apply) to create the IAM-authed app user."
value = <<-SQL
CREATE USER ${var.db_username};
GRANT rds_iam TO ${var.db_username};
GRANT ALL PRIVILEGES ON DATABASE ${var.db_name} TO ${var.db_username};
GRANT ALL ON SCHEMA public TO ${var.db_username};
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO ${var.db_username};
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO ${var.db_username};
SQL
}
# Pre-baked command for running the one-off migration task. ECS run-task
# needs the subnet + SG IDs at call time, so we render the full command.
output "migration_run_command" {
description = "Shell command that runs the one-off prisma migration task against Aurora. Run this once, after the bootstrap SQL above, before sending traffic."
value = format(
"aws ecs run-task --cluster %s --launch-type FARGATE --task-definition %s --network-configuration 'awsvpcConfiguration={subnets=[%s],securityGroups=[%s],assignPublicIp=DISABLED}' --region %s",
aws_ecs_cluster.this.name,
aws_ecs_task_definition.migrations.arn,
join(",", aws_subnet.private[*].id),
aws_security_group.tasks.id,
var.region,
)
}

View File

@ -0,0 +1,13 @@
provider "aws" {
region = var.region
default_tags {
tags = merge(
{
"litellm:stack" = local.name
"managed-by" = "terraform"
},
var.tags,
)
}
}

View File

@ -0,0 +1,85 @@
# Aurora Postgres cluster with one writer + one reader instance, IAM
# database authentication enabled.
#
# Important: enabling IAM auth on the cluster does not by itself grant any
# Postgres user the ability to log in with an IAM token. After the first
# apply, connect as the master user (password lives in Secrets Manager
# see `master_user_secret_arn` in outputs) and run, once:
#
# CREATE USER {var.db_username};
# GRANT rds_iam TO {var.db_username};
# GRANT ALL PRIVILEGES ON DATABASE {var.db_name} TO {var.db_username};
# GRANT ALL ON SCHEMA public TO {var.db_username};
#
# After that, the gateway/backend/migration tasks (which authenticate as
# `{var.db_username}` via IAM-signed tokens) can connect. The master user
# itself is a superuser and Postgres refuses to grant `rds_iam` to
# superusers keep it for break-glass only.
resource "aws_db_subnet_group" "this" {
name = "${local.name}-db"
subnet_ids = aws_subnet.private[*].id
}
resource "aws_rds_cluster_parameter_group" "this" {
name = "${local.name}-cluster-pg"
family = "aurora-postgresql${split(".", var.db_engine_version)[0]}"
description = "LiteLLM Aurora Postgres cluster parameters."
}
resource "aws_rds_cluster" "this" {
cluster_identifier = local.name
engine = "aurora-postgresql"
engine_mode = "provisioned"
engine_version = var.db_engine_version
database_name = var.db_name
master_username = var.db_master_username
master_password = random_password.db_master_password.result
db_subnet_group_name = aws_db_subnet_group.this.name
vpc_security_group_ids = [aws_security_group.rds.id]
db_cluster_parameter_group_name = aws_rds_cluster_parameter_group.this.name
iam_database_authentication_enabled = true
storage_encrypted = true
apply_immediately = true
# Final-snapshot guard. With the safe default (skip_final_snapshot = false),
# `terraform destroy` takes a snapshot named `<cluster>-final-<short-sha>`
# before dropping the cluster. The short SHA disambiguates repeated
# destroy/recreate cycles so each snapshot has a unique name.
skip_final_snapshot = var.skip_final_snapshot
final_snapshot_identifier = var.skip_final_snapshot ? null : "${local.name}-final-${substr(md5(local.name), 0, 8)}"
backup_retention_period = 7
preferred_backup_window = "07:00-09:00"
}
resource "aws_rds_cluster_instance" "writer" {
identifier = "${local.name}-writer"
cluster_identifier = aws_rds_cluster.this.id
instance_class = var.db_instance_class
engine = aws_rds_cluster.this.engine
engine_version = aws_rds_cluster.this.engine_version
publicly_accessible = false
performance_insights_enabled = true
# Promotion tier 0 first in line during failover, so this instance stays
# the writer unless it goes unhealthy.
promotion_tier = 0
}
resource "aws_rds_cluster_instance" "reader" {
identifier = "${local.name}-reader"
cluster_identifier = aws_rds_cluster.this.id
instance_class = var.db_instance_class
engine = aws_rds_cluster.this.engine
engine_version = aws_rds_cluster.this.engine_version
publicly_accessible = false
performance_insights_enabled = true
# Higher promotion tier won't be picked as writer during a failover
# unless the writer instance itself is gone.
promotion_tier = 15
}

View File

@ -0,0 +1,33 @@
resource "aws_elasticache_subnet_group" "this" {
name = "${local.name}-redis"
subnet_ids = aws_subnet.private[*].id
}
# Replication group (not aws_elasticache_cluster, which is the
# Memcached / single-node Redis resource and can't be upgraded in-place
# to HA). With redis_num_replicas >= 1 we get automatic_failover_enabled
# + multi_az_enabled; at_rest_encryption_enabled and
# transit_encryption_enabled are on unconditionally so Redis traffic is
# TLS-protected the proxy connects via the rediss:// scheme thanks to
# REDIS_SSL=true in the shared task env (see ecs.tf).
resource "aws_elasticache_replication_group" "this" {
replication_group_id = "${local.name}-redis"
description = "LiteLLM ElastiCache Redis"
engine = "redis"
engine_version = "7.1"
node_type = var.redis_node_type
num_cache_clusters = 1 + var.redis_num_replicas
parameter_group_name = "default.redis7"
port = 6379
subnet_group_name = aws_elasticache_subnet_group.this.name
security_group_ids = [aws_security_group.redis.id]
automatic_failover_enabled = var.redis_num_replicas >= 1
multi_az_enabled = var.redis_num_replicas >= 1
at_rest_encryption_enabled = true
transit_encryption_enabled = true
apply_immediately = true
}

View File

@ -0,0 +1,80 @@
# General-purpose S3 bucket for the proxy. LiteLLM uses S3 for:
# - Cache backend (cache_params.s3_bucket_name in proxy_config)
# - Request log archival (S3_REQUEST_LOGS_BUCKET_NAME)
# - /v1/files endpoint passthrough storage
#
# The bucket name + region are exposed to gateway + backend as S3_BUCKET_NAME
# / S3_REGION_NAME so proxy_config can reference them via
# `os.environ/S3_BUCKET_NAME`. The task role is scoped to this bucket only.
resource "random_id" "s3_suffix" {
byte_length = 4
}
resource "aws_s3_bucket" "this" {
bucket = "${local.name}-${random_id.s3_suffix.hex}"
# Default false `terraform destroy` refuses on a non-empty bucket so
# cached responses, archived request logs, and /v1/files storage stay put.
# Flip to true only for ephemeral / CI stacks (`var.s3_force_destroy`).
force_destroy = var.s3_force_destroy
}
resource "aws_s3_bucket_versioning" "this" {
bucket = aws_s3_bucket.this.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
bucket = aws_s3_bucket.this.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_s3_bucket_public_access_block" "this" {
bucket = aws_s3_bucket.this.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
# Task role gains object-level read/write on this bucket. Bucket-level perms
# (list/location) are also scoped to this bucket only.
data "aws_iam_policy_document" "s3_access" {
statement {
actions = [
"s3:ListBucket",
"s3:GetBucketLocation",
]
resources = [aws_s3_bucket.this.arn]
}
statement {
actions = [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts",
]
resources = ["${aws_s3_bucket.this.arn}/*"]
}
}
resource "aws_iam_policy" "s3_access" {
name = "${local.name}-s3-access"
policy = data.aws_iam_policy_document.s3_access.json
}
resource "aws_iam_role_policy_attachment" "task_s3_access" {
role = aws_iam_role.task.name
policy_arn = aws_iam_policy.s3_access.arn
}

View File

@ -0,0 +1,86 @@
resource "random_password" "master_key" {
length = 48
special = false
min_lower = 4
min_upper = 4
min_numeric = 4
}
# Master DB password used once to bootstrap the IAM-authed application
# user (see rds.tf header). Runtime services authenticate via IAM tokens
# and never read this secret.
resource "random_password" "db_master_password" {
length = 32
special = false
min_lower = 4
min_upper = 4
min_numeric = 4
}
# LITELLM_MASTER_KEY must begin with `sk-` per the proxy's validator.
resource "aws_secretsmanager_secret" "master_key" {
name = "${local.name}-master-key"
description = "LITELLM_MASTER_KEY for gateway + backend."
recovery_window_in_days = 0
}
resource "aws_secretsmanager_secret_version" "master_key" {
secret_id = aws_secretsmanager_secret.master_key.id
# When the operator passes litellm_master_key, use it verbatim. Otherwise
# fall back to the auto-generated `sk-` value (trial / OSS path).
secret_string = coalesce(var.litellm_master_key, "sk-${random_password.master_key.result}")
}
# LITELLM_LICENSE only created when the operator supplies one. The
# task-execution role gets GetSecretValue via iam.tf, and gateway + backend
# pick the env var up through shared_secrets in ecs.tf.
resource "aws_secretsmanager_secret" "license" {
count = var.litellm_license == "" ? 0 : 1
name = "${local.name}-license"
description = "LITELLM_LICENSE for gateway + backend."
recovery_window_in_days = 0
}
resource "aws_secretsmanager_secret_version" "license" {
count = var.litellm_license == "" ? 0 : 1
secret_id = aws_secretsmanager_secret.license[0].id
secret_string = var.litellm_license
}
# UI_PASSWORD backend-only. Same pattern as license: only created when
# the operator supplies one. The execution role gets GetSecretValue via
# iam.tf, and the backend task picks the env var up through
# backend_managed_secrets in ecs.tf.
resource "aws_secretsmanager_secret" "ui_password" {
count = var.ui_password == "" ? 0 : 1
name = "${local.name}-ui-password"
description = "UI_PASSWORD for the backend (UI admin login)."
recovery_window_in_days = 0
}
resource "aws_secretsmanager_secret_version" "ui_password" {
count = var.ui_password == "" ? 0 : 1
secret_id = aws_secretsmanager_secret.ui_password[0].id
secret_string = var.ui_password
}
resource "aws_secretsmanager_secret" "db_master_password" {
name = "${local.name}-db-master-password"
description = "Aurora master-user password - bootstrap only. Runtime auth is IAM-token."
recovery_window_in_days = 0
}
resource "aws_secretsmanager_secret_version" "db_master_password" {
secret_id = aws_secretsmanager_secret.db_master_password.id
secret_string = jsonencode({
username = var.db_master_username
password = random_password.db_master_password.result
host = aws_rds_cluster.this.endpoint
port = aws_rds_cluster.this.port
dbname = var.db_name
})
}

View File

@ -0,0 +1,88 @@
region = "us-west-2"
azs = ["us-west-2a", "us-west-2b"]
# Resource naming: every AWS resource the stack creates is named
# `${tenant}-litellm-${env}` (or that plus a per-resource suffix). E.g.
# tenant="acme" + env="stage" → ALB `acme-litellm-stage`, ECS service
# `acme-litellm-stage-gateway`, etc.
tenant = "acme"
env = "stage"
# Tenant-supplied secrets. Prefer TF_VAR_litellm_master_key /
# TF_VAR_litellm_license / TF_VAR_ui_password env vars so the values don't
# end up in a committed tfvars file. All three are optional — when
# omitted the stack auto-generates a master key, runs without a license,
# and falls back to LITELLM_MASTER_KEY for UI login.
# litellm_master_key = "sk-..."
# litellm_license = "lic-..."
# ui_password = "..."
# TLS: provide an ACM cert for production. Without one, plan fails unless
# allow_plaintext_alb = true is set explicitly (trial/dev only).
# acm_certificate_arn = "arn:aws:acm:us-west-2:111122223333:certificate/..."
# allow_plaintext_alb = true
# Storage retention: false (default) makes `terraform destroy` refuse on a
# non-empty bucket. Flip to true only for ephemeral / CI stacks.
# s3_force_destroy = false
# Component images. Defaults pin all four to the same GHCR release tag —
# bump them together when bumping LiteLLM. Override here to pull from a
# private registry or to mix-and-match versions.
# gateway_image = "ghcr.io/berriai/litellm-gateway:1.86.0-dev"
# backend_image = "ghcr.io/berriai/litellm-backend:1.86.0-dev"
# ui_image = "ghcr.io/berriai/litellm-ui:1.86.0-dev"
# migrations_image = "ghcr.io/berriai/litellm-migrations:1.86.0-dev"
# Per-task sizing for the gateway. Defaults are 1 vCPU / 4 GiB / 1 worker.
# uvicorn rule of thumb for CPU-bound work is (2 * vCPU) + 1 workers.
# gateway_cpu = 1024 # 1024 = 1 vCPU
# gateway_memory = 4096 # MiB
# gateway_num_workers = 1
# ---------- proxy_config (mirrors helm gateway.config.proxy_config) ----------
# proxy_config = {
# model_list = [
# {
# model_name = "gpt-4o"
# litellm_params = {
# model = "openai/gpt-4o"
# api_key = "os.environ/OPENAI_API_KEY"
# }
# },
# {
# model_name = "claude-sonnet-4-6"
# litellm_params = {
# model = "anthropic/claude-sonnet-4-6"
# api_key = "os.environ/ANTHROPIC_API_KEY"
# }
# },
# ]
# general_settings = {
# master_key = "os.environ/LITELLM_MASTER_KEY"
# database_url = "os.environ/DATABASE_URL"
# }
# }
# ---------- Extra env / secrets ----------
# Plain-text env vars (non-sensitive). Land directly in the ECS task def.
# gateway_extra_env = {
# LANGFUSE_HOST = "https://us.cloud.langfuse.com"
# }
# Backend env vars commonly tuned in prod: SSO redirect, docs branding,
# UI admin username. UI_PASSWORD is its own first-class var (see top).
# backend_extra_env = {
# AUTO_REDIRECT_UI_LOGIN_TO_SSO = "true"
# DOCS_TITLE = "Acme LiteLLM"
# UI_USERNAME = "admin"
# }
# Provider API keys, sourced from existing Secrets Manager entries. The
# execution role auto-gains GetSecretValue on each ARN listed here. The
# values you reference above as `os.environ/OPENAI_API_KEY` must appear
# here. Same shape works for backend_extra_secrets.
# gateway_extra_secrets = {
# OPENAI_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:openai-api-key-AbCdEf"
# ANTHROPIC_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:anthropic-api-key-GhIjKl"
# }

View File

@ -0,0 +1,458 @@
variable "region" {
description = "AWS region to deploy into."
type = string
}
variable "tenant" {
description = "Tenant slug — used as the prefix for every AWS resource the stack creates. Combined with var.env to form `<tenant>-litellm-<env>` (e.g. `acme-litellm-stage`)."
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{0,20}$", var.tenant))
error_message = "tenant must be 1-21 chars, lower-kebab-case, starting with a letter."
}
}
variable "env" {
description = "Environment suffix appended to every resource name (e.g. `stage`, `prod`, `dev`)."
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{0,8}$", var.env))
error_message = "env must be 1-9 chars, lower-kebab-case, starting with a letter."
}
}
variable "tags" {
description = "Additional tags merged into the provider default_tags."
type = map(string)
default = {}
}
# ---------- Tenant-supplied secrets ----------
#
# Both default to "" so the stack stays usable for trial / OSS deploys.
# Set via TF_VAR_litellm_master_key / TF_VAR_litellm_license to keep the
# values out of state files committed to a VCS.
variable "litellm_master_key" {
description = <<-EOT
Pre-existing LITELLM_MASTER_KEY (must begin with `sk-`). When set, this
value is written to the master-key Secrets Manager entry. When empty,
the stack auto-generates a random `sk-` key (preserving today's
trial-deploy behavior).
EOT
type = string
default = ""
sensitive = true
}
variable "litellm_license" {
description = <<-EOT
LiteLLM enterprise license string. When set, the stack creates a
`<tenant>-litellm-<env>-license` Secrets Manager entry, grants the
task-execution role GetSecretValue on it, and exposes its value to
gateway + backend as `LITELLM_LICENSE`. Leave empty for OSS-only deploys.
EOT
type = string
default = ""
sensitive = true
}
variable "ui_password" {
description = <<-EOT
UI admin password. When set, the stack creates a
`<tenant>-litellm-<env>-ui-password` Secrets Manager entry, grants the
task-execution role GetSecretValue on it, and exposes its value to the
backend as `UI_PASSWORD`. Pair with `backend_extra_env.UI_USERNAME` to
set the matching username. Leave empty to skip the proxy then falls
back to the LITELLM_MASTER_KEY for UI login.
EOT
type = string
default = ""
sensitive = true
}
# ---------- Networking ----------
variable "vpc_cidr" {
description = "CIDR block for the VPC."
type = string
default = "10.40.0.0/16"
}
variable "azs" {
description = "Availability zones to spread subnets across. At least 2 required for RDS and ALB."
type = list(string)
validation {
condition = length(var.azs) >= 2
error_message = "Provide at least 2 availability zones."
}
}
# ---------- Component images ----------
#
# Defaults pin the four componentized images at the same release tag on
# GHCR. Override on a per-component basis in tfvars when bumping; bump them
# together when bumping the LiteLLM release.
variable "gateway_image" {
description = "Container image for the gateway (data plane, port 4000). Tag must match a tag actually published to GHCR — the split images use the `v`-prefixed semver convention."
type = string
default = "ghcr.io/berriai/litellm-gateway:v1.86.0-dev"
}
variable "backend_image" {
description = "Container image for the backend (management API, port 4001)."
type = string
default = "ghcr.io/berriai/litellm-backend:v1.86.0-dev"
}
variable "ui_image" {
description = "Container image for the UI (nginx static export, port 3000)."
type = string
default = "ghcr.io/berriai/litellm-ui:v1.86.0-dev"
}
variable "migrations_image" {
description = <<-EOT
Container image for the one-off prisma migration task. Built from
`migrations/Dockerfile` slim image whose ENTRYPOINT runs
`python3 /app/run.py` (assembles DATABASE_URL from DATABASE_* env vars
via DatabaseURLSettings, then runs `prisma migrate deploy`). Should track
the same release tag as gateway/backend/ui.
EOT
type = string
default = "ghcr.io/berriai/litellm-migrations:v1.86.0-dev"
}
# ---------- Service sizing ----------
variable "gateway_cpu" {
description = "Fargate CPU units for the gateway task (1024 = 1 vCPU)."
type = number
default = 1024
}
variable "gateway_memory" {
description = "Fargate memory (MiB) for the gateway task."
type = number
default = 4096
}
variable "gateway_desired_count" {
description = "Desired number of gateway tasks."
type = number
default = 2
}
variable "gateway_num_workers" {
description = "uvicorn worker processes per gateway task (passed as --workers). Size relative to gateway_cpu — uvicorn recommends ~(2 × vCPU) + 1 for CPU-bound work."
type = number
default = 1
validation {
condition = var.gateway_num_workers >= 1
error_message = "gateway_num_workers must be >= 1."
}
}
variable "backend_cpu" {
description = "Fargate CPU units for the backend task (1024 = 1 vCPU)."
type = number
default = 1024
}
variable "backend_memory" {
description = "Fargate memory (MiB) for the backend task. The proxy_server import chain alone needs >1 GiB; 4 GiB matches gateway."
type = number
default = 4096
}
variable "backend_desired_count" {
description = "Desired number of backend tasks."
type = number
default = 1
}
variable "ui_cpu" {
description = "Fargate CPU units for the UI task."
type = number
default = 256
}
variable "ui_memory" {
description = "Fargate memory (MiB) for the UI task."
type = number
default = 512
}
variable "ui_desired_count" {
description = "Desired number of UI tasks."
type = number
default = 1
}
# ---------- Autoscaling ----------
# Defaults mirror helm/litellm/values.yaml HPAs. The "*_desired_count" vars
# above seed the initial task count; once autoscaling is enabled, the service's
# desired_count is left to Application Auto Scaling (ecs.tf ignores future
# changes to it).
variable "gateway_autoscaling_enabled" {
description = "Toggle Application Auto Scaling target-tracking on the gateway service."
type = bool
default = true
}
variable "gateway_min_capacity" {
description = "Minimum gateway task count under autoscaling."
type = number
default = 1
}
variable "gateway_max_capacity" {
description = "Maximum gateway task count under autoscaling."
type = number
default = 10
}
variable "gateway_cpu_target" {
description = "Target average CPU utilization (%) for the gateway autoscaling policy."
type = number
default = 70
}
variable "gateway_memory_target" {
description = "Target average memory utilization (%) for the gateway autoscaling policy. Set 0 to skip the memory policy and scale on CPU only."
type = number
default = 80
}
variable "backend_autoscaling_enabled" {
description = "Toggle Application Auto Scaling target-tracking on the backend service."
type = bool
default = true
}
variable "backend_min_capacity" {
description = "Minimum backend task count under autoscaling."
type = number
default = 1
}
variable "backend_max_capacity" {
description = "Maximum backend task count under autoscaling."
type = number
default = 4
}
variable "backend_cpu_target" {
description = "Target average CPU utilization (%) for the backend autoscaling policy."
type = number
default = 70
}
variable "ui_autoscaling_enabled" {
description = "Toggle Application Auto Scaling target-tracking on the UI service. Off by default — UI is a static nginx export and one task is usually enough."
type = bool
default = false
}
variable "ui_min_capacity" {
description = "Minimum UI task count under autoscaling."
type = number
default = 1
}
variable "ui_max_capacity" {
description = "Maximum UI task count under autoscaling."
type = number
default = 3
}
variable "ui_cpu_target" {
description = "Target average CPU utilization (%) for the UI autoscaling policy."
type = number
default = 80
}
# ---------- RDS ----------
variable "db_instance_class" {
description = "Aurora instance class for both writer and reader."
type = string
default = "db.r6g.large"
}
variable "db_engine_version" {
description = "Aurora Postgres engine version. Major version drives the parameter-group family (aurora-postgresql<major>)."
type = string
default = "16.4"
}
variable "db_name" {
description = "Initial database name created on the Aurora cluster."
type = string
default = "litellm"
}
variable "db_master_username" {
description = "Aurora master (superuser) username — used only to bootstrap the IAM-authed application user."
type = string
default = "postgres"
}
variable "db_username" {
description = "IAM-authed Postgres user the proxy connects as. Must be CREATEd in the cluster and granted the rds_iam role — see terraform/litellm/aws/README.md."
type = string
default = "litellm_app"
}
# ---------- Redis ----------
variable "redis_node_type" {
description = "ElastiCache node type."
type = string
default = "cache.t4g.small"
}
variable "redis_num_replicas" {
description = "Number of read replicas in the Redis replication group. The primary plus this many replicas form the cluster — set to 0 for a single-node dev deployment, 1+ for HA. multi_az_enabled and automatic_failover_enabled require >= 1."
type = number
default = 1
validation {
condition = var.redis_num_replicas >= 0
error_message = "redis_num_replicas must be >= 0."
}
}
# ---------- TLS ----------
variable "acm_certificate_arn" {
description = <<-EOT
ACM certificate ARN for the ALB's HTTPS listener. When set, the stack
provisions a 443 listener carrying the same path-routing rules as the 80
listener, and the 80 listener is rewritten to redirect HTTPHTTPS. Leave
empty ("") to disable TLS (must combine with `allow_plaintext_alb = true`
for the plan to succeed see README.md "TLS").
EOT
type = string
default = ""
}
variable "allow_plaintext_alb" {
description = <<-EOT
Opt into HTTP-only mode on the ALB (port 80, no TLS). Default false:
`terraform plan` fails when `acm_certificate_arn = ""` so the operator
must either provide an ACM cert or consciously opt out. Intended for
short-lived trial / dev stacks only.
EOT
type = bool
default = false
}
# ---------- RDS ----------
variable "skip_final_snapshot" {
description = "Skip the Aurora final snapshot on `terraform destroy`. Default false — destroying the cluster takes a snapshot first so data is recoverable. Set true only for ephemeral / CI environments where you accept permanent data loss on destroy."
type = bool
default = false
}
variable "s3_force_destroy" {
description = <<-EOT
Allow `terraform destroy` to delete the S3 bucket even when it still
contains objects (request log archives, /v1/files storage, S3 cache
backend). Default false destroying a non-empty bucket fails, acting
as a tripwire against accidental data loss. Set true only for
ephemeral / CI environments. Mirrors the safety posture of
`skip_final_snapshot` on Aurora.
EOT
type = bool
default = false
}
# ---------- Extra env ----------
variable "gateway_extra_env" {
description = <<-EOT
Additional plain-text env vars for the gateway container. Use this for
non-sensitive config (LANGFUSE_HOST, custom feature flags, ). For API
keys, use gateway_extra_secrets instead.
EOT
type = map(string)
default = {}
}
variable "backend_extra_env" {
description = "Additional plain-text env vars for the backend container."
type = map(string)
default = {}
}
variable "gateway_extra_secrets" {
description = <<-EOT
Extra env vars sourced from AWS Secrets Manager. Map of env-var name to
Secrets Manager ARN. Pass the bare secret ARN to inject the whole secret
string as the env var value, or append ":<jsonKey>::" to extract a single
JSON field (ECS docs).
Example for OPENAI_API_KEY:
gateway_extra_secrets = {
OPENAI_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:openai-api-key-AbCdEf"
}
The stack's task execution role automatically gains GetSecretValue on every
ARN referenced here (suffix-stripped).
EOT
type = map(string)
default = {}
}
variable "backend_extra_secrets" {
description = "Same shape as gateway_extra_secrets, but layered onto the backend container."
type = map(string)
default = {}
}
variable "proxy_config" {
description = <<-EOT
LiteLLM proxy config (the contents of config.yaml). Mirrors the helm
chart's `gateway.config.proxy_config` value. Passed to gateway, backend,
and the migration task as a base64-encoded env var and decoded to
/tmp/litellm-config.yaml at container start; CONFIG_FILE_PATH is set
automatically.
Example:
proxy_config = {
model_list = [
{
model_name = "gpt-4o"
litellm_params = {
model = "openai/gpt-4o"
api_key = "os.environ/OPENAI_API_KEY"
}
},
]
general_settings = {
master_key = "os.environ/LITELLM_MASTER_KEY"
database_url = "os.environ/DATABASE_URL"
ui_username = "admin"
}
}
Leave empty ({}) to skip mounting a config the proxy then runs with
defaults. Use the "os.environ/<NAME>" syntax in the YAML to reference
env vars provided by *_extra_env or *_extra_secrets.
EOT
type = any
default = {}
}
variable "log_retention_days" {
description = "CloudWatch log retention for the three services."
type = number
default = 30
}

View File

@ -0,0 +1,14 @@
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.60"
}
random = {
source = "hashicorp/random"
version = "~> 3.6"
}
}
}

View File

@ -0,0 +1,62 @@
# This file is maintained automatically by "terraform init".
# Manual edits may be lost in future updates.
provider "registry.terraform.io/hashicorp/google" {
version = "6.50.0"
constraints = "~> 6.10"
hashes = [
"h1:79CwMTsp3Ud1nOl5hFS5mxQHyT0fGVye7pqpU0PPlHI=",
"zh:1f3513fcfcbf7ca53d667a168c5067a4dd91a4d4cccd19743e248ff31065503c",
"zh:3da7db8fc2c51a77dd958ea8baaa05c29cd7f829bd8941c26e2ea9cb3aadc1e5",
"zh:3e09ac3f6ca8111cbb659d38c251771829f4347ab159a12db195e211c76068bb",
"zh:7bb9e41c568df15ccf1a8946037355eefb4dfb4e35e3b190808bb7c4abae547d",
"zh:81e5d78bdec7778e6d67b5c3544777505db40a826b6eb5abe9b86d4ba396866b",
"zh:8d309d020fb321525883f5c4ea864df3d5942b6087f6656d6d8b3a1377f340fc",
"zh:93e112559655ab95a523193158f4a4ac0f2bfed7eeaa712010b85ebb551d5071",
"zh:d3efe589ffd625b300cef5917c4629513f77e3a7b111c9df65075f76a46a63c7",
"zh:d4a4d672bbef756a870d8f32b35925f8ce2ef4f6bbd5b71a3cb764f1b6c85421",
"zh:e13a86bca299ba8a118e80d5f84fbdd708fe600ecdceea1a13d4919c068379fe",
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
"zh:fec30c095647b583a246c39d557704947195a1b7d41f81e369ba377d997faef6",
]
}
provider "registry.terraform.io/hashicorp/google-beta" {
version = "6.50.0"
constraints = "~> 6.10"
hashes = [
"h1:P2GiUJM1frlPtBViwKn1A9V2dVBdGuWcX80w9TdH8ZE=",
"zh:18b442bd0a05321d39dda1e9e3f1bdede4e61bc2ac62cc7a67037a3864f75101",
"zh:2e387c51455862828bec923a3ec81abf63a4d998da470cf00e09003bda53d668",
"zh:3942e708fa84ebe54996086f4b1398cb747fe19cbcd0be07ace528291fb35dee",
"zh:496287dd48b34ae6197cb1f887abeafd07c33f389dbe431bb01e24846754cfdd",
"zh:6eca885419969ce5c2a706f34dce1f10bde9774757675f2d8a92d12e5a1be390",
"zh:710dbef826c3fe7f76f844dae47937e8e4c1279dd9205ec4610be04cf3327244",
"zh:777ebf44b24bfc7bdbf770dc089f1a72f143b4718fdedb8c6bd75983115a1ec2",
"zh:9c8703bba37b8c7ad857efc3513392c5a096c519397c1cb822d7612f38e4262f",
"zh:c4f1d3a73de2702277c99d5348ad6d374705bcfdd367ad964ff4cfd2cf06c281",
"zh:eca8df11af3f5a948492d5b8b5d01b4ec705aad10bc30ec1524205508ae28393",
"zh:f41e7fd5f2628e8fd6b8ea136366923858f54428d1729898925469b862c275c2",
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
]
}
provider "registry.terraform.io/hashicorp/random" {
version = "3.8.1"
constraints = "~> 3.6"
hashes = [
"h1:u8AKlWVDTH5r9YLSeswoVEjiY72Rt4/ch7U+61ZDkiQ=",
"zh:08dd03b918c7b55713026037c5400c48af5b9f468f483463321bd18e17b907b4",
"zh:0eee654a5542dc1d41920bbf2419032d6f0d5625b03bd81339e5b33394a3e0ae",
"zh:229665ddf060aa0ed315597908483eee5b818a17d09b6417a0f52fd9405c4f57",
"zh:2469d2e48f28076254a2a3fc327f184914566d9e40c5780b8d96ebf7205f8bc0",
"zh:37d7eb334d9561f335e748280f5535a384a88675af9a9eac439d4cfd663bcb66",
"zh:741101426a2f2c52dee37122f0f4a2f2d6af6d852cb1db634480a86398fa3511",
"zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
"zh:a902473f08ef8df62cfe6116bd6c157070a93f66622384300de235a533e9d4a9",
"zh:b85c511a23e57a2147355932b3b6dce2a11e856b941165793a0c3d7578d94d05",
"zh:c5172226d18eaac95b1daac80172287b69d4ce32750c82ad77fa0768be4ea4b8",
"zh:dab4434dba34aad569b0bc243c2d3f3ff86dd7740def373f2a49816bd2ff819b",
"zh:f49fd62aa8c5525a5c17abd51e27ca5e213881d58882fd42fec4a545b53c9699",
]
}

View File

@ -0,0 +1,296 @@
# LiteLLM on GCP (Cloud Run)
Deploys the componentized LiteLLM proxy on GCP:
- **VPC** + Private Services Access range + a Serverless VPC Access connector
so Cloud Run can reach private IPs
- **Cloud SQL for PostgreSQL** — primary instance + cross-zone read replica,
password auth via Secret Manager
- **Memorystore (Redis)** for caching + rate limiting, private IP only
- **GCS bucket** — private, versioned, uniform IAM; exposed as `GCS_BUCKET_NAME`
- **Secret Manager** entries for `LITELLM_MASTER_KEY` and `DATABASE_PASSWORD`
- **Cloud Run v2** services for `gateway` (port 4000), `backend` (port 4001),
and `ui` (port 3000), all using a shared runtime service account
- **Cloud Run Job** (`litellm-migrations`) that runs `prisma migrate deploy` from the dedicated `ghcr.io/berriai/litellm-migrations` image
- **External global HTTP(S) load balancer** with serverless NEGs and a URL
map mirroring the helm-chart ingress path routing:
- LLM data-plane prefixes → `gateway`
- UI asset paths → `ui`
- Everything else → `backend`
## Image pulls
There are four images: `litellm-gateway`, `litellm-backend`, `litellm-ui`,
and `litellm-migrations` (slim image used only by the one-off Cloud Run
Job — runs `prisma migrate deploy` against the writer DB and exits).
Bump them together when bumping LiteLLM.
Cloud Run only accepts images from Artifact Registry, `[region.]gcr.io`,
or `docker.io``ghcr.io` URIs are rejected at apply time. The four
images are published to GHCR upstream, so any real deploy needs an
Artifact Registry remote repository pointed at GHCR.
**One-time setup (per project):** create a remote repo and let Cloud Run
pull through it.
```bash
gcloud artifacts repositories create litellm \
--repository-format=docker \
--location=us-central1 \
--mode=remote-repository \
--remote-repo-config-desc="GitHub Container Registry passthrough" \
--remote-docker-repo=https://ghcr.io
```
Then point the stack at it via `image_registry`:
```hcl
image_registry = "us-central1-docker.pkg.dev/my-gcp-project/litellm/berriai"
image_tag = "v1.86.0-dev"
```
The four `litellm-<component>:${image_tag}` URIs are composed from those
two vars. Set `gateway_image` / `backend_image` / `ui_image` /
`migrations_image` only if you need a per-component override (custom
build, different tag).
Two further notes:
- The runtime SAs the stack creates do **not** need
`roles/artifactregistry.reader` — Cloud Run pulls images using the
per-project serverless agent
(`service-<project-num>@serverless-robot-prod.iam.gserviceaccount.com`),
not the runtime SA.
- For a fully air-gapped option, mirror the images into a regular AR
repository instead of a remote repo:
```bash
for c in gateway backend ui migrations; do
docker pull ghcr.io/berriai/litellm-$c:<tag>
docker tag ghcr.io/berriai/litellm-$c:<tag> \
us-central1-docker.pkg.dev/$PROJECT/litellm/$c:<tag>
docker push us-central1-docker.pkg.dev/$PROJECT/litellm/$c:<tag>
done
```
then set `image_registry = "us-central1-docker.pkg.dev/$PROJECT/litellm"`
(drop the `/berriai` suffix — the mirrored layout has no org segment).
## Database authentication
LiteLLM's `init_iam_db_url_from_env()` mints **AWS RDS** tokens via boto3 —
it doesn't speak GCP IAM. To IAM-auth against Cloud SQL from Cloud Run you'd
need the Cloud SQL Auth Proxy as a sidecar, which complicates the service
spec. This stack therefore uses **password authentication**:
- A random password is generated and stored in Secret Manager
(`<name>-db-password`).
- Each Cloud Run service receives the password as `DATABASE_PASSWORD` via
`value_source.secret_key_ref`.
- The container's entrypoint shim assembles `DATABASE_URL` (and
`DATABASE_URL_READ_REPLICA`) from `DATABASE_HOST` / `DATABASE_PASSWORD`
before exec'ing uvicorn — so the password never appears in the service
spec or in logs.
If you need GCP-native IAM auth later, add `cloud-sql-proxy` as a sidecar
container under `template.template.containers` (Cloud Run v2 supports
multiple containers) and replace the password-based URL with the proxy's
Unix socket.
## Configuring the proxy
### `proxy_config`
Mirrors the helm chart's `gateway.config.proxy_config`. The map is
YAML-encoded and base64-passed to gateway, backend, and the migration job;
each container decodes it to `/tmp/litellm-config.yaml` at startup and sets
`CONFIG_FILE_PATH`.
```hcl
proxy_config = {
model_list = [
{
model_name = "gpt-4o"
litellm_params = {
model = "openai/gpt-4o"
api_key = "os.environ/OPENAI_API_KEY"
}
},
]
general_settings = {
master_key = "os.environ/LITELLM_MASTER_KEY"
database_url = "os.environ/DATABASE_URL"
}
}
```
LiteLLM resolves `os.environ/<NAME>` references against the container
environment. Provider API keys belong in `*_extra_secrets` and are
referenced from the YAML by env-var name.
### Extra env / secrets
Non-sensitive env vars:
```hcl
gateway_extra_env = {
LANGFUSE_HOST = "https://us.cloud.langfuse.com"
}
```
Sensitive values — create the secret in Secret Manager first, then reference
its resource ID:
```bash
echo -n "sk-proj-..." | gcloud secrets create openai-api-key --data-file=-
```
```hcl
gateway_extra_secrets = {
OPENAI_API_KEY = "projects/my-gcp-project/secrets/openai-api-key"
}
```
The Cloud Run runtime SA auto-gains `roles/secretmanager.secretAccessor` on
every secret referenced. **Pass the bare secret resource ID only**
`projects/.../secrets/openai-api-key`, never the version-suffixed form
`projects/.../secrets/openai-api-key/versions/3`. The Cloud Run
`secret_key_ref` binding and the stack's IAM `secret_id` grant both
reject the version suffix; version is always resolved as `latest`. If
you need a pinned version, edit `local.gateway_extra_secret_kv` in
`cloudrun.tf` directly to set `version = "3"` for the entry in question.
## Tenant deployment
Every resource the stack creates is named `${tenant}-litellm-${env}` (or
that plus a per-resource suffix), so multiple tenants and multiple
environments coexist in the same project as long as the `(tenant, env)`
pair differs:
| `tenant` | `env` | Example resource name |
| -------- | ------- | ---------------------------------- |
| `acme` | `stage` | `acme-litellm-stage-gateway` |
| `acme` | `prod` | `acme-litellm-prod-master-key` |
| `globex` | `dev` | `globex-litellm-dev-license` |
For a per-tenant instance, the only inputs that change are the tenant
slug, env, and the two pre-issued secrets:
```bash
export TF_VAR_litellm_master_key="sk-..." # the tenant's master key
export TF_VAR_litellm_license="lic-..." # their LITELLM_LICENSE
terraform apply \
-var "project=my-gcp-project" \
-var "region=us-central1" \
-var "tenant=acme" \
-var "env=stage"
```
Both `litellm_master_key` and `litellm_license` are optional:
- Omit `litellm_master_key` → the stack auto-generates a random `sk-…`
value (trial/dev path).
- Omit `litellm_license` → no license secret is created and gateway/
backend run without `LITELLM_LICENSE` (OSS-only).
Use `TF_VAR_*` env vars rather than tfvars files for these — values
written to a tfvars file end up in `terraform.tfstate` and any committed
example files.
## Quick start
```bash
cd terraform/litellm/gcp
cp terraform.tfvars.example terraform.tfvars
# Edit: project, region, tenant, env, *_image, proxy_config, gateway_extra_secrets.
terraform init
terraform apply
```
That single apply provisions everything, runs the prisma schema migration via
the Cloud Run job (auto-triggered by `bootstrap.tf`), and only then starts the
gateway/backend services. When it returns, the stack is serving traffic.
```bash
terraform output lb_url
# UI login: admin / <master key>
gcloud secrets versions access latest --secret="$(terraform output -raw master_key_secret_id)"
```
The `migration_run_command` output is preserved for break-glass manual re-runs.
**Prerequisite**: `gcloud` must be authenticated (`gcloud auth login`) and the
required APIs must be enabled (run, sqladmin, redis, secretmanager,
vpcaccess, compute, servicenetworking, storage, artifactregistry).
## TLS
`terraform plan` refuses to provision an HTTP-only LB by default — TLS
is the supported posture. Two paths:
**Production / staging — set `lb_domains`:**
1. `terraform apply` once with `allow_plaintext_lb = true` (intentional
chicken-and-egg escape hatch) to provision the LB and read the anycast
IP from `terraform output -raw lb_ip`.
2. Point each DNS name you want to serve from at that IP.
3. Set `lb_domains = ["proxy.example.com"]` and remove
`allow_plaintext_lb`; re-apply.
Result: a 443 forwarding rule with a Google-managed cert covering each
listed domain; the 80 forwarding rule is rewritten to serve a permanent
301 redirect to HTTPS, so HTTP clients are automatically upgraded. The
managed cert sits in `PROVISIONING` for ~15-60 min on first apply until
DNS propagation completes — `gcloud compute ssl-certificates describe
<tenant>-litellm-<env>-cert` shows the state.
**Trial / dev — explicitly opt into HTTP-only:**
Set `allow_plaintext_lb = true` and leave `lb_domains = []`. Without the
flag, plan fails with a clear error pointing at the precondition.
Intended for short-lived trial / dev stacks only.
## Storage and database retention
Two opt-in tripwires guard against accidental data loss on
`terraform destroy`:
- **`cloudsql_deletion_protection`** (Cloud SQL writer + reader;
default `true`) — destroy fails with a clear error rather than
dropping the database.
- **`gcs_force_destroy`** (GCS bucket holding request log archives,
`/v1/files` content, and the GCS cache backend; default `false`) —
`terraform destroy` against a non-empty bucket fails.
Flip `cloudsql_deletion_protection` to `false` or `gcs_force_destroy` to
`true` only for ephemeral / CI stacks where you accept losing the data.
## Redis encryption
Memorystore runs with `transit_encryption_mode = "SERVER_AUTHENTICATION"`,
so the proxy connects via `rediss://`. The instance's self-signed CA cert
(`server_ca_certs[0].cert`) is shipped to gateway + backend as
`REDIS_CA_PEM_B64`; their entrypoint shell decodes it to `/tmp/redis-ca.pem`
before uvicorn starts and points `REDIS_SSL_CA_CERTS` at that path. No
extra config needed — but if you ever swap Memorystore for an external
Redis, override `REDIS_HOST`/`REDIS_PORT` and either drop these env vars
or point them at your own CA.
## Files
| File | What's in it |
| ----------------- | -------------------------------------------------------------------- |
| `versions.tf` | Terraform + provider version constraints |
| `providers.tf` | Google + Google-Beta providers |
| `variables.tf` | All input variables |
| `locals.tf` | Path-prefix lists (mirror of `helm/.../ingress.yaml`) + proxy_config helpers |
| `network.tf` | VPC, subnet, PSA range, Serverless VPC connector |
| `secrets.tf` | Secret Manager entries + random master_key |
| `cloudsql.tf` | Cloud SQL writer + read replica + app user + password secret |
| `redis.tf` | Memorystore Redis (private IP) |
| `gcs.tf` | GCS bucket + objectAdmin binding |
| `iam.tf` | Runtime SA + Cloud SQL client + Secret Manager accessor |
| `cloudrun.tf` | 3 Cloud Run services + Cloud Run Job for migrations |
| `load_balancer.tf`| External HTTPS LB, serverless NEGs, URL map for path routing |
| `outputs.tf` | LB IP, service URLs, secret IDs, migration `execute` command |

View File

@ -0,0 +1,43 @@
# Auto-runs the prisma schema migration as part of `terraform apply`. Mirrors
# the AWS stack's terraform_data.migration in spirit. Cloud SQL doesn't need a
# separate user-bootstrap step because google_sql_user.app already creates the
# application user so the only post-cluster work is the migration.
#
# Gateway/backend Cloud Run services depend on this resource (in cloudrun.tf)
# so they don't go live until the schema is in place.
#
# Triggers:
# - re-runs if the migrations image changes (new release ships new prisma
# migration files).
# - re-runs if the migration job is recreated.
#
# Requires `gcloud` on the machine running terraform, with user creds live
# enough to invoke Cloud Run admin APIs (`gcloud auth login`).
resource "terraform_data" "migration" {
triggers_replace = {
job_id = google_cloud_run_v2_job.migrations.id
job_image = local.migrations_image
}
provisioner "local-exec" {
interpreter = ["bash", "-c"]
environment = {
JOB = google_cloud_run_v2_job.migrations.name
REGION = var.region
PROJECT = var.project
}
command = <<-EOT
set -euo pipefail
gcloud run jobs execute "$JOB" \
--region "$REGION" \
--project "$PROJECT" \
--wait
EOT
}
depends_on = [
google_cloud_run_v2_job.migrations,
google_sql_user.app,
]
}

View File

@ -0,0 +1,430 @@
# Three Cloud Run v2 services + one Cloud Run v2 job for migrations.
# All four use the same service account and the same VPC connector for
# private egress to Cloud SQL + Memorystore.
locals {
# Memorystore exposes a self-signed CA cert per instance; we ship it as
# a base64 env var and decode it to a file at container startup so the
# rediss:// connection can validate. Public cert, not sensitive.
redis_ca_pem_b64 = base64encode(google_redis_instance.this.server_ca_certs[0].cert)
shared_env_kv = [
{ name = "DATABASE_HOST", value = google_sql_database_instance.writer.private_ip_address },
{ name = "DATABASE_PORT", value = "5432" },
{ name = "DATABASE_USER", value = var.db_username },
{ name = "DATABASE_NAME", value = var.db_name },
{ name = "DATABASE_HOST_READ_REPLICA", value = google_sql_database_instance.reader.private_ip_address },
{ name = "DATABASE_PORT_READ_REPLICA", value = "5432" },
{ name = "REDIS_HOST", value = google_redis_instance.this.host },
{ name = "REDIS_PORT", value = tostring(google_redis_instance.this.port) },
# _redis.get_redis_url_from_environment honors REDIS_SSL to flip the
# scheme to rediss://; REDIS_SSL_CA_CERTS is mapped via
# _get_redis_env_kwarg_mapping ssl_ca_certs on the redis-py client.
{ name = "REDIS_SSL", value = "true" },
{ name = "REDIS_SSL_CA_CERTS", value = "/tmp/redis-ca.pem" },
{ name = "REDIS_CA_PEM_B64", value = local.redis_ca_pem_b64 },
{ name = "GCS_BUCKET_NAME", value = google_storage_bucket.this.name },
]
# Cloud Run v2 secret env vars use value_source.secret_key_ref pointing at a
# secret resource ID. Shared between gateway and backend (the migrations
# job has its own narrower env list see migrations_env_secrets below).
shared_env_secrets = concat(
[
{ name = "LITELLM_MASTER_KEY", secret = google_secret_manager_secret.master_key.id, version = "latest" },
{ name = "DATABASE_PASSWORD", secret = google_secret_manager_secret.db_password.id, version = "latest" },
],
var.litellm_license == "" ? [] : [
{ name = "LITELLM_LICENSE", secret = google_secret_manager_secret.license[0].id, version = "latest" },
],
)
# Backend-only managed secrets. UI_PASSWORD is consumed by the management
# API (UI login flow) and has no use on the gateway data plane.
backend_managed_env_secrets = var.ui_password == "" ? [] : [
{ name = "UI_PASSWORD", secret = google_secret_manager_secret.ui_password[0].id, version = "latest" },
]
# Per-component extras (from variables).
gateway_extra_env_kv = [
for k, v in var.gateway_extra_env : { name = k, value = v }
]
backend_extra_env_kv = [
for k, v in var.backend_extra_env : { name = k, value = v }
]
backend_default_env_kv = [
{ name = "STORE_MODEL_IN_DB", value = "true" },
]
gateway_extra_secret_kv = [
for k, v in var.gateway_extra_secrets : { name = k, secret = v, version = "latest" }
]
backend_extra_secret_kv = [
for k, v in var.backend_extra_secrets : { name = k, secret = v, version = "latest" }
]
# Shell fragments composed with && so any failure short-circuits the
# whole startup instead of falling through to `exec uvicorn`. The
# python step is only included when the caller provided a proxy_config.
proxy_config_fragment = local.proxy_config_enabled ? [
"python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\""
] : []
# Decode the Memorystore CA cert (passed as REDIS_CA_PEM_B64) to the
# path REDIS_SSL_CA_CERTS points at, so the redis-py client can validate
# the rediss:// handshake.
redis_ca_fragment = [
"python -c \"import os, base64, pathlib; pathlib.Path(os.environ['REDIS_SSL_CA_CERTS']).write_bytes(base64.b64decode(os.environ['REDIS_CA_PEM_B64']))\""
]
database_url_fragment = [
"export DATABASE_URL=\"postgresql://$${DATABASE_USER}:$${DATABASE_PASSWORD}@$${DATABASE_HOST}:$${DATABASE_PORT}/$${DATABASE_NAME}\"",
"export DATABASE_URL_READ_REPLICA=\"postgresql://$${DATABASE_USER}:$${DATABASE_PASSWORD}@$${DATABASE_HOST_READ_REPLICA}:$${DATABASE_PORT_READ_REPLICA}/$${DATABASE_NAME}\"",
]
gateway_args = join(" && ", concat(
local.proxy_config_fragment,
local.redis_ca_fragment,
local.database_url_fragment,
["exec uvicorn gateway.main:app --host 0.0.0.0 --port 4000"],
))
backend_args = join(" && ", concat(
local.proxy_config_fragment,
local.redis_ca_fragment,
local.database_url_fragment,
["exec uvicorn backend.main:app --host 0.0.0.0 --port 4001"],
))
# Env shipped to the migrations Job. The migrations image runs run.py
# which assembles DATABASE_URL from these discrete vars itself, so we
# only need writer-side DB env (no read replica, no proxy_config, no
# master key).
migrations_env_kv = [
{ name = "DATABASE_HOST", value = google_sql_database_instance.writer.private_ip_address },
{ name = "DATABASE_PORT", value = "5432" },
{ name = "DATABASE_USER", value = var.db_username },
{ name = "DATABASE_NAME", value = var.db_name },
]
migrations_env_secrets = [
{ name = "DATABASE_PASSWORD", secret = google_secret_manager_secret.db_password.id, version = "latest" },
]
}
# ---------- Gateway ----------
resource "google_cloud_run_v2_service" "gateway" {
name = "${local.name}-gateway"
location = var.region
ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
template {
service_account = google_service_account.runtime.email
max_instance_request_concurrency = var.gateway_max_instance_request_concurrency
vpc_access {
connector = google_vpc_access_connector.this.id
egress = "PRIVATE_RANGES_ONLY"
}
scaling {
min_instance_count = var.gateway_min_instances
max_instance_count = var.gateway_max_instances
}
containers {
image = local.gateway_image
command = ["sh", "-c"]
args = [local.gateway_args]
ports {
container_port = 4000
}
resources {
limits = {
cpu = var.gateway_cpu
memory = var.gateway_memory
}
}
dynamic "env" {
for_each = concat(local.shared_env_kv, local.gateway_extra_env_kv, local.proxy_config_env)
content {
name = env.value.name
value = env.value.value
}
}
dynamic "env" {
for_each = concat(local.shared_env_secrets, local.gateway_extra_secret_kv)
content {
name = env.value.name
value_source {
secret_key_ref {
secret = env.value.secret
version = env.value.version
}
}
}
}
startup_probe {
http_get {
path = "/health/readiness"
port = 4000
}
initial_delay_seconds = 10
period_seconds = 10
timeout_seconds = 5
failure_threshold = 12
}
liveness_probe {
http_get {
path = "/health/liveliness"
port = 4000
}
period_seconds = 30
timeout_seconds = 5
}
}
}
depends_on = [
google_secret_manager_secret_iam_member.master_key,
google_secret_manager_secret_iam_member.db_password,
google_secret_manager_secret_iam_member.license,
google_secret_manager_secret_iam_member.extras,
google_sql_user.app,
# Don't go live until the schema is migrated; otherwise the proxy boots,
# fails on missing tables, and Cloud Run keeps cold-restarting.
terraform_data.migration,
]
}
# ---------- Backend ----------
resource "google_cloud_run_v2_service" "backend" {
name = "${local.name}-backend"
location = var.region
ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
template {
service_account = google_service_account.runtime.email
max_instance_request_concurrency = var.backend_max_instance_request_concurrency
vpc_access {
connector = google_vpc_access_connector.this.id
egress = "PRIVATE_RANGES_ONLY"
}
scaling {
min_instance_count = var.backend_min_instances
max_instance_count = var.backend_max_instances
}
containers {
image = local.backend_image
command = ["sh", "-c"]
args = [local.backend_args]
ports {
container_port = 4001
}
resources {
limits = {
cpu = var.backend_cpu
memory = var.backend_memory
}
}
dynamic "env" {
for_each = concat(local.shared_env_kv, local.backend_default_env_kv, local.backend_extra_env_kv, local.proxy_config_env)
content {
name = env.value.name
value = env.value.value
}
}
dynamic "env" {
for_each = concat(local.shared_env_secrets, local.backend_managed_env_secrets, local.backend_extra_secret_kv)
content {
name = env.value.name
value_source {
secret_key_ref {
secret = env.value.secret
version = env.value.version
}
}
}
}
startup_probe {
http_get {
path = "/health/readiness"
port = 4001
}
initial_delay_seconds = 10
period_seconds = 10
timeout_seconds = 5
failure_threshold = 12
}
liveness_probe {
http_get {
path = "/health/liveliness"
port = 4001
}
period_seconds = 30
timeout_seconds = 5
}
}
}
depends_on = [
google_secret_manager_secret_iam_member.master_key,
google_secret_manager_secret_iam_member.db_password,
google_secret_manager_secret_iam_member.license,
google_secret_manager_secret_iam_member.ui_password,
google_secret_manager_secret_iam_member.extras,
google_sql_user.app,
terraform_data.migration,
]
}
# ---------- UI ----------
# Static nginx no DB, no Redis, no secrets. Runs as ui_runtime, a SA
# with zero IAM bindings, so a compromised UI container can't pivot to
# Secret Manager / Cloud SQL via the metadata service.
resource "google_cloud_run_v2_service" "ui" {
name = "${local.name}-ui"
location = var.region
ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
template {
service_account = google_service_account.ui_runtime.email
max_instance_request_concurrency = var.ui_max_instance_request_concurrency
scaling {
min_instance_count = var.ui_min_instances
max_instance_count = var.ui_max_instances
}
containers {
image = local.ui_image
ports {
container_port = 3000
}
resources {
limits = {
cpu = var.ui_cpu
memory = var.ui_memory
}
}
startup_probe {
http_get {
path = "/healthz"
port = 3000
}
initial_delay_seconds = 5
period_seconds = 10
timeout_seconds = 3
failure_threshold = 6
}
}
}
}
# Allow the LB (any unauthenticated traffic from the configured serverless
# NEG) to invoke the Cloud Run services. The actual auth is in the proxy
# (LITELLM_MASTER_KEY); these IAM bindings just open up Cloud Run's invoker
# gate so the LB request makes it to the container.
resource "google_cloud_run_v2_service_iam_member" "gateway_allusers" {
project = var.project
location = google_cloud_run_v2_service.gateway.location
name = google_cloud_run_v2_service.gateway.name
role = "roles/run.invoker"
member = "allUsers"
}
resource "google_cloud_run_v2_service_iam_member" "backend_allusers" {
project = var.project
location = google_cloud_run_v2_service.backend.location
name = google_cloud_run_v2_service.backend.name
role = "roles/run.invoker"
member = "allUsers"
}
resource "google_cloud_run_v2_service_iam_member" "ui_allusers" {
project = var.project
location = google_cloud_run_v2_service.ui.location
name = google_cloud_run_v2_service.ui.name
role = "roles/run.invoker"
member = "allUsers"
}
# ---------- Migrations job ----------
# Dedicated litellm-migrations image slim, ENTRYPOINT runs run.py which
# assembles DATABASE_URL from the DATABASE_* env vars and runs `prisma
# migrate deploy`. No proxy_config, no master key, no shell wrapper.
resource "google_cloud_run_v2_job" "migrations" {
name = "${local.name}-migrations"
location = var.region
template {
template {
service_account = google_service_account.runtime.email
vpc_access {
connector = google_vpc_access_connector.this.id
egress = "PRIVATE_RANGES_ONLY"
}
containers {
image = local.migrations_image
# Prisma's Node + Rust engine plus the v2 migration resolver
# routinely peaks above 1 GiB while applying the schema, so 2 GiB
# is the floor 1 GiB OOM-kills mid-migrate. CPU stays at 1 vCPU
# (Cloud Run requires >= 1 with concurrency > 1, and `prisma
# migrate deploy` is single-threaded so more buys nothing).
resources {
limits = {
cpu = "1000m"
memory = "4Gi"
}
}
dynamic "env" {
for_each = local.migrations_env_kv
content {
name = env.value.name
value = env.value.value
}
}
dynamic "env" {
for_each = local.migrations_env_secrets
content {
name = env.value.name
value_source {
secret_key_ref {
secret = env.value.secret
version = env.value.version
}
}
}
}
}
}
}
depends_on = [
google_secret_manager_secret_iam_member.db_password,
google_sql_user.app,
]
}

View File

@ -0,0 +1,102 @@
# Cloud SQL for PostgreSQL one primary + one read replica.
#
# Note on auth: LiteLLM's IAM-auth helper (rds_iam_token.py) mints AWS RDS
# tokens via boto3 and doesn't speak GCP IAM. Cloud SQL IAM auth from Cloud
# Run requires the Cloud SQL Auth Proxy as a sidecar, which complicates the
# Cloud Run service spec. We instead use password auth: a random password
# lives in Secret Manager and is injected into the Cloud Run services as
# DATABASE_PASSWORD. The writer's DATABASE_URL is assembled inside the
# container at startup; the reader URL is built from the replica's IP.
resource "google_sql_database_instance" "writer" {
name = local.name
region = var.region
database_version = var.db_version
depends_on = [google_service_networking_connection.psa]
settings {
# ENTERPRISE accepts the db-custom-* and db-n1-* tiers we default to.
# ENTERPRISE_PLUS only accepts db-perf-optimized-* and is ~3x cost set
# var.db_edition = "ENTERPRISE_PLUS" + change var.db_tier together if you
# want it.
edition = var.db_edition
tier = var.db_tier
availability_type = "REGIONAL"
disk_size = 20
disk_autoresize = true
backup_configuration {
enabled = true
point_in_time_recovery_enabled = true
start_time = "07:00"
}
ip_configuration {
ipv4_enabled = false
private_network = google_compute_network.this.id
}
insights_config {
query_insights_enabled = true
record_application_tags = true
record_client_address = true
}
}
deletion_protection = var.cloudsql_deletion_protection
}
resource "google_sql_database_instance" "reader" {
name = "${local.name}-reader"
region = var.region
database_version = var.db_version
master_instance_name = google_sql_database_instance.writer.name
depends_on = [google_service_networking_connection.psa]
settings {
edition = var.db_edition
tier = var.db_tier
availability_type = "ZONAL"
disk_autoresize = true
ip_configuration {
ipv4_enabled = false
private_network = google_compute_network.this.id
}
}
deletion_protection = var.cloudsql_deletion_protection
}
resource "google_sql_database" "this" {
name = var.db_name
instance = google_sql_database_instance.writer.name
}
resource "random_password" "db_password" {
length = 32
special = false
min_lower = 4
min_upper = 4
min_numeric = 4
}
resource "google_sql_user" "app" {
name = var.db_username
instance = google_sql_database_instance.writer.name
password = random_password.db_password.result
}
resource "google_secret_manager_secret" "db_password" {
secret_id = "${local.name}-db-password"
replication {
auto {}
}
}
resource "google_secret_manager_secret_version" "db_password" {
secret = google_secret_manager_secret.db_password.id
secret_data = random_password.db_password.result
}

View File

@ -0,0 +1,29 @@
# General-purpose GCS bucket same role as the AWS S3 bucket. The bucket
# name is exposed to gateway + backend as GCS_BUCKET_NAME; reference it
# from proxy_config via `os.environ/GCS_BUCKET_NAME`.
resource "random_id" "bucket_suffix" {
byte_length = 4
}
resource "google_storage_bucket" "this" {
name = "${var.project}-${local.name}-${random_id.bucket_suffix.hex}"
location = var.region
uniform_bucket_level_access = true
force_destroy = var.gcs_force_destroy
versioning {
enabled = true
}
public_access_prevention = "enforced"
labels = var.labels
}
# Cloud Run runtime SA gains object admin on this bucket only.
resource "google_storage_bucket_iam_member" "runtime" {
bucket = google_storage_bucket.this.name
role = "roles/storage.objectAdmin"
member = "serviceAccount:${google_service_account.runtime.email}"
}

View File

@ -0,0 +1,71 @@
# Runtime SA used by the gateway, backend, and migration job has Cloud
# SQL client + Secret Manager accessor on every managed/extra secret. The
# UI deliberately uses a *different* SA (below) so a compromised UI
# container can't read master_key / db_password / license / ui_password /
# provider creds via the metadata service.
resource "google_service_account" "runtime" {
account_id = "${local.name}-runtime"
display_name = "LiteLLM Cloud Run runtime"
}
# UI runtime SA no role bindings. The UI is static nginx with no DB,
# Redis, or Secret Manager dependencies, so its task identity should not
# be able to read any of those. Cloud Run pulls the UI image via the
# project's serverless service agent (not this SA), so it doesn't need
# artifactregistry.reader either.
resource "google_service_account" "ui_runtime" {
account_id = "${local.name}-ui-runtime"
display_name = "LiteLLM Cloud Run UI runtime (no data-plane access)"
}
# Cloud SQL client lets the Cloud Run services connect to the instance
# over private IP via the VPC connector.
resource "google_project_iam_member" "runtime_cloudsql" {
project = var.project
role = "roles/cloudsql.client"
member = "serviceAccount:${google_service_account.runtime.email}"
}
# Secret Manager accessor managed secrets first (split out as separate
# resources because their IDs are computed-at-apply and can't drive a
# for_each).
resource "google_secret_manager_secret_iam_member" "master_key" {
secret_id = google_secret_manager_secret.master_key.id
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.runtime.email}"
}
resource "google_secret_manager_secret_iam_member" "db_password" {
secret_id = google_secret_manager_secret.db_password.id
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.runtime.email}"
}
# License secret accessor only created when var.litellm_license is set.
resource "google_secret_manager_secret_iam_member" "license" {
count = var.litellm_license == "" ? 0 : 1
secret_id = google_secret_manager_secret.license[0].id
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.runtime.email}"
}
# UI password secret accessor only created when var.ui_password is set.
resource "google_secret_manager_secret_iam_member" "ui_password" {
count = var.ui_password == "" ? 0 : 1
secret_id = google_secret_manager_secret.ui_password[0].id
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.runtime.email}"
}
# User-supplied extras. Dedupe on the secret resource ID two different
# env-var names could reference the same secret, and we want exactly one
# IAM binding per (secret, role, member) tuple in state.
resource "google_secret_manager_secret_iam_member" "extras" {
for_each = toset(values(merge(var.gateway_extra_secrets, var.backend_extra_secrets)))
secret_id = each.value
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.runtime.email}"
}

View File

@ -0,0 +1,185 @@
# External global HTTP(S) load balancer fronting all three Cloud Run
# services. URL map mirrors the helm-chart ingress path routing:
# - LLM data-plane paths gateway
# - UI asset paths ui
# - Everything else backend (management API: /key/*, /user/*, )
#
# By default the LB serves plain HTTP on port 80. Set var.lb_domains to a
# list of DNS names already pointing at lb_ip and the stack provisions a
# Google-managed SSL cert + 443 forwarding rule, and the 80 forwarding rule
# is rewritten to redirect HTTPHTTPS via a redirect-only URL map.
locals {
tls_enabled = length(var.lb_domains) > 0
}
resource "google_compute_global_address" "lb" {
name = "${local.name}-lb-ip"
}
# Serverless NEGs one per Cloud Run service.
resource "google_compute_region_network_endpoint_group" "gateway" {
name = "${local.name}-gateway-neg"
region = var.region
network_endpoint_type = "SERVERLESS"
cloud_run {
service = google_cloud_run_v2_service.gateway.name
}
}
resource "google_compute_region_network_endpoint_group" "backend" {
name = "${local.name}-backend-neg"
region = var.region
network_endpoint_type = "SERVERLESS"
cloud_run {
service = google_cloud_run_v2_service.backend.name
}
}
resource "google_compute_region_network_endpoint_group" "ui" {
name = "${local.name}-ui-neg"
region = var.region
network_endpoint_type = "SERVERLESS"
cloud_run {
service = google_cloud_run_v2_service.ui.name
}
}
# Backend services wrap each NEG.
resource "google_compute_backend_service" "gateway" {
name = "${local.name}-gateway-bs"
protocol = "HTTP"
load_balancing_scheme = "EXTERNAL_MANAGED"
backend {
group = google_compute_region_network_endpoint_group.gateway.id
}
}
resource "google_compute_backend_service" "backend" {
name = "${local.name}-backend-bs"
protocol = "HTTP"
load_balancing_scheme = "EXTERNAL_MANAGED"
backend {
group = google_compute_region_network_endpoint_group.backend.id
}
}
resource "google_compute_backend_service" "ui" {
name = "${local.name}-ui-bs"
protocol = "HTTP"
load_balancing_scheme = "EXTERNAL_MANAGED"
backend {
group = google_compute_region_network_endpoint_group.ui.id
}
}
# URL map. Default backend (management API). Path matchers route the
# gateway and UI prefixes elsewhere.
resource "google_compute_url_map" "this" {
name = local.name
default_service = google_compute_backend_service.backend.id
host_rule {
hosts = ["*"]
path_matcher = "main"
}
path_matcher {
name = "main"
default_service = google_compute_backend_service.backend.id
# UI paths (catch them before any /v1/* gateway rules so /favicon.ico
# and / take precedence).
path_rule {
paths = local.ui_path_prefixes
service = google_compute_backend_service.ui.id
}
# Gateway path prefixes. GCP URL maps cap a path_rule at 10 path globs,
# so chunk into rules of 10.
dynamic "path_rule" {
for_each = { for idx, chunk in chunklist(local.gateway_path_prefixes, 10) : idx => chunk }
content {
paths = path_rule.value
service = google_compute_backend_service.gateway.id
}
}
}
}
# Permanent HTTPHTTPS redirect URL map. Only attached to the port-80
# target proxy when TLS is enabled; otherwise the regular path-routing
# URL map is attached to the HTTP proxy and everything stays plaintext.
resource "google_compute_url_map" "https_redirect" {
count = local.tls_enabled ? 1 : 0
name = "${local.name}-redirect"
default_url_redirect {
https_redirect = true
redirect_response_code = "MOVED_PERMANENTLY_DEFAULT"
strip_query = false
}
}
resource "google_compute_target_http_proxy" "this" {
name = "${local.name}-http"
url_map = local.tls_enabled ? google_compute_url_map.https_redirect[0].id : google_compute_url_map.this.id
# Default-deny on the HTTP-only path: TLS is the supported posture.
# Operators must either supply DNS names or explicitly opt in.
lifecycle {
precondition {
condition = local.tls_enabled || var.allow_plaintext_lb
error_message = "LB has no HTTPS forwarding rule. Either set `lb_domains` to a list of DNS names you want a Google-managed cert for, or set `allow_plaintext_lb = true` to opt into HTTP-only (trial / dev only)."
}
}
}
resource "google_compute_global_forwarding_rule" "http" {
name = "${local.name}-http"
ip_protocol = "TCP"
port_range = "80"
load_balancing_scheme = "EXTERNAL_MANAGED"
ip_address = google_compute_global_address.lb.address
target = google_compute_target_http_proxy.this.id
}
# ---------- HTTPS (gated on var.lb_domains) ----------
#
# Google-managed certs require each listed domain to resolve to lb_ip
# *before* the cert provisions; on first apply the cert sits in
# PROVISIONING for ~15-60 min until DNS propagates. The LB starts serving
# 443 immediately, but cert handshakes fail until the managed cert
# transitions to ACTIVE.
resource "google_compute_managed_ssl_certificate" "this" {
count = local.tls_enabled ? 1 : 0
name = "${local.name}-cert"
managed {
domains = var.lb_domains
}
}
resource "google_compute_target_https_proxy" "this" {
count = local.tls_enabled ? 1 : 0
name = "${local.name}-https"
url_map = google_compute_url_map.this.id
ssl_certificates = [google_compute_managed_ssl_certificate.this[0].id]
}
resource "google_compute_global_forwarding_rule" "https" {
count = local.tls_enabled ? 1 : 0
name = "${local.name}-https"
ip_protocol = "TCP"
port_range = "443"
load_balancing_scheme = "EXTERNAL_MANAGED"
ip_address = google_compute_global_address.lb.address
target = google_compute_target_https_proxy.this[0].id
}

View File

@ -0,0 +1,79 @@
# Gateway path prefixes mirrored verbatim from gateway/routes/allowlist.py
# and helm/litellm/templates/ingress.yaml. URL maps use the "path matcher"
# rule with `paths` lists; up to 10 path globs per rule, up to 50 rules
# per matcher. Easily fits the gateway list in one rule per chunk-of-10.
locals {
# Every resource the stack creates is named `${tenant}-litellm-${env}`
# (or that with a per-resource suffix). Computed once here so the rest of
# the stack can reference local.name.
name = "${var.tenant}-litellm-${var.env}"
gateway_path_prefixes = [
"/v1/chat/*", "/chat/*",
"/v1/completions*", "/completions*",
"/v1/embeddings*", "/embeddings*",
"/v1/moderations*", "/moderations*",
"/v1/audio/*", "/audio/*",
"/v1/images/*", "/images/*",
"/v1/files*", "/files*",
"/v1/batches*", "/batches*",
"/v1/fine_tuning/*", "/fine_tuning/*",
"/v1/fine-tuning/*", "/fine-tuning/*",
"/v1/responses*", "/responses*",
"/v1/threads*", "/threads*",
"/v1/assistants*", "/assistants*",
"/v1/vector_stores*", "/vector_stores*",
"/v1/indexes*",
"/v1/models*", "/models*",
"/openai/*", "/engines/*",
"/v1/messages*", "/messages*",
"/v1/skills/*", "/v1/a2a/*",
"/v1/rerank*", "/v2/rerank*", "/rerank*",
"/v1/ocr*", "/ocr*",
"/v1/rag/*", "/rag/*",
"/v1/video/*", "/v1/videos/*", "/video/*", "/videos/*",
"/v1/search*", "/search*",
"/v1/containers/*", "/containers/*",
"/v1/evals/*",
"/v1/memory/*",
"/queue/chat/*",
"/v1beta/*",
"/interactions/*",
"/anthropic/*", "/azure/*", "/azure_ai/*", "/aws/*", "/bedrock/*",
"/cohere/*", "/gemini/*", "/google/*",
"/vertex_ai/*", "/vertex-ai/*",
"/assemblyai/*", "/eu.assemblyai/*",
"/langfuse/*", "/vllm/*",
"/mistral/*", "/groq/*", "/voyage/*", "/cursor/*", "/milvus/*",
"/openai_passthrough/*",
"/toolset/*",
"/v1/realtime*", "/realtime*",
"/health*", "/metrics", "/test*",
]
ui_path_prefixes = [
"/",
"/favicon.ico",
"/litellm-asset-prefix/*",
"/_next/*",
"/assets/*",
"/ui",
"/ui/*",
]
proxy_config_enabled = length(keys(var.proxy_config)) > 0
proxy_config_b64 = local.proxy_config_enabled ? base64encode(yamlencode(var.proxy_config)) : ""
proxy_config_env = local.proxy_config_enabled ? [
{ name = "LITELLM_PROXY_CONFIG_B64", value = local.proxy_config_b64 },
{ name = "CONFIG_FILE_PATH", value = "/tmp/litellm-config.yaml" },
] : []
# Resolved image URIs: per-component override wins, otherwise compose
# from image_registry + image_tag. Cloud Run only accepts AR / gcr.io /
# docker.io paths see variables.tf for the full constraint list.
gateway_image = var.gateway_image != "" ? var.gateway_image : "${var.image_registry}/litellm-gateway:${var.image_tag}"
backend_image = var.backend_image != "" ? var.backend_image : "${var.image_registry}/litellm-backend:${var.image_tag}"
ui_image = var.ui_image != "" ? var.ui_image : "${var.image_registry}/litellm-ui:${var.image_tag}"
migrations_image = var.migrations_image != "" ? var.migrations_image : "${var.image_registry}/litellm-migrations:${var.image_tag}"
}

View File

@ -0,0 +1,46 @@
resource "google_compute_network" "this" {
name = local.name
auto_create_subnetworks = false
routing_mode = "REGIONAL"
}
resource "google_compute_subnetwork" "this" {
name = "${local.name}-${var.region}"
region = var.region
network = google_compute_network.this.id
ip_cidr_range = var.subnet_cidr
private_ip_google_access = true
}
# Private Services Access (PSA) range for Cloud SQL + Memorystore. Both
# managed services peer with the VPC over the connection below using
# addresses from this range.
resource "google_compute_global_address" "psa" {
name = "${local.name}-psa"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = google_compute_network.this.id
}
resource "google_service_networking_connection" "psa" {
network = google_compute_network.this.id
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = [google_compute_global_address.psa.name]
}
# Serverless VPC Access connector required so Cloud Run can reach
# Cloud SQL / Memorystore private IPs via the PSA range.
#
# min/max instances are required by the API now (you can't just set
# machine_type alone). Defaults: 2 e2-micro instances scale up to 3 fine
# for low-to-moderate Cloud Run egress; bump max if your services push
# heavy private-network traffic.
resource "google_vpc_access_connector" "this" {
name = "${local.name}-conn"
region = var.region
network = google_compute_network.this.name
ip_cidr_range = var.vpc_connector_cidr
min_instances = 2
max_instances = 3
}

View File

@ -0,0 +1,64 @@
output "lb_ip" {
description = "Global anycast IP of the external HTTPS load balancer."
value = google_compute_global_address.lb.address
}
output "lb_url" {
description = "Proxy URL. Switches scheme based on whether lb_domains is set; when TLS is enabled the URL points at the first listed domain (since managed certs are tied to the hostname, not the anycast IP). The dashboard is served at /, the API at /v1/*."
value = local.tls_enabled ? "https://${var.lb_domains[0]}" : "http://${google_compute_global_address.lb.address}"
}
output "gateway_service_url" {
description = "Default Cloud Run URL for the gateway (bypasses the LB)."
value = google_cloud_run_v2_service.gateway.uri
}
output "backend_service_url" {
description = "Default Cloud Run URL for the backend (bypasses the LB)."
value = google_cloud_run_v2_service.backend.uri
}
output "ui_service_url" {
description = "Default Cloud Run URL for the UI (bypasses the LB)."
value = google_cloud_run_v2_service.ui.uri
}
output "cloudsql_writer_ip" {
description = "Private IP of the Cloud SQL writer."
value = google_sql_database_instance.writer.private_ip_address
}
output "cloudsql_reader_ip" {
description = "Private IP of the Cloud SQL read replica."
value = google_sql_database_instance.reader.private_ip_address
}
output "redis_endpoint" {
description = "Memorystore Redis endpoint."
value = "${google_redis_instance.this.host}:${google_redis_instance.this.port}"
}
output "gcs_bucket" {
description = "GCS bucket name. Exposed to gateway + backend as GCS_BUCKET_NAME. Reference from proxy_config via `os.environ/GCS_BUCKET_NAME`."
value = google_storage_bucket.this.name
}
output "master_key_secret_id" {
description = "Secret Manager resource ID holding LITELLM_MASTER_KEY. Fetch with `gcloud secrets versions access latest --secret=<id>`."
value = google_secret_manager_secret.master_key.secret_id
}
output "db_password_secret_id" {
description = "Secret Manager resource ID holding the Cloud SQL app-user password."
value = google_secret_manager_secret.db_password.secret_id
}
output "migration_run_command" {
description = "Shell command that executes the one-off migration job against Cloud SQL. Run this once after the first apply."
value = format(
"gcloud run jobs execute %s --region %s --project %s --wait",
google_cloud_run_v2_job.migrations.name,
var.region,
var.project,
)
}

View File

@ -0,0 +1,9 @@
provider "google" {
project = var.project
region = var.region
}
provider "google-beta" {
project = var.project
region = var.region
}

View File

@ -0,0 +1,20 @@
resource "google_redis_instance" "this" {
name = local.name
tier = var.redis_tier
memory_size_gb = var.redis_memory_size_gb
region = var.region
authorized_network = google_compute_network.this.id
connect_mode = "PRIVATE_SERVICE_ACCESS"
redis_version = "REDIS_7_0"
# In-transit encryption between Cloud Run and Memorystore. The instance
# exposes its self-signed CA via `server_ca_certs` (read in cloudrun.tf
# and passed to the proxy as REDIS_CA_PEM_B64); the proxy decodes it to
# /tmp/redis-ca.pem at startup and uses it to validate the rediss://
# handshake. Mirrors `transit_encryption_enabled = true` on AWS.
transit_encryption_mode = "SERVER_AUTHENTICATION"
depends_on = [google_service_networking_connection.psa]
}

View File

@ -0,0 +1,62 @@
resource "random_password" "master_key" {
length = 48
special = false
min_lower = 4
min_upper = 4
min_numeric = 4
}
# LITELLM_MASTER_KEY (sk-) lives in Secret Manager. The Cloud Run service
# account gets accessor permission on it (see iam.tf).
resource "google_secret_manager_secret" "master_key" {
secret_id = "${local.name}-master-key"
replication {
auto {}
}
}
resource "google_secret_manager_secret_version" "master_key" {
secret = google_secret_manager_secret.master_key.id
# When the operator passes litellm_master_key, use it verbatim. Otherwise
# fall back to the auto-generated `sk-` value (trial / OSS path).
secret_data = coalesce(var.litellm_master_key, "sk-${random_password.master_key.result}")
}
# LITELLM_LICENSE only created when the operator supplies one. The runtime
# SA gets accessor permission via iam.tf, and gateway + backend pick it up
# through shared_env_secrets in cloudrun.tf.
resource "google_secret_manager_secret" "license" {
count = var.litellm_license == "" ? 0 : 1
secret_id = "${local.name}-license"
replication {
auto {}
}
}
resource "google_secret_manager_secret_version" "license" {
count = var.litellm_license == "" ? 0 : 1
secret = google_secret_manager_secret.license[0].id
secret_data = var.litellm_license
}
# UI_PASSWORD backend-only. Same pattern as license: only created when
# the operator supplies one. The runtime SA gets accessor permission via
# iam.tf, and the backend service picks the env var up through
# backend_managed_env_secrets in cloudrun.tf.
resource "google_secret_manager_secret" "ui_password" {
count = var.ui_password == "" ? 0 : 1
secret_id = "${local.name}-ui-password"
replication {
auto {}
}
}
resource "google_secret_manager_secret_version" "ui_password" {
count = var.ui_password == "" ? 0 : 1
secret = google_secret_manager_secret.ui_password[0].id
secret_data = var.ui_password
}

View File

@ -0,0 +1,77 @@
project = "my-gcp-project"
region = "us-central1"
# Resource naming: every GCP resource the stack creates is named
# `${tenant}-litellm-${env}` (or that plus a per-resource suffix). E.g.
# tenant="acme" + env="stage" → Cloud Run service `acme-litellm-stage-gateway`,
# Cloud SQL instance `acme-litellm-stage`, etc.
tenant = "acme"
env = "stage"
# Tenant-supplied secrets. Prefer TF_VAR_litellm_master_key /
# TF_VAR_litellm_license / TF_VAR_ui_password env vars so the values don't
# end up in a committed tfvars file. All three are optional — when
# omitted the stack auto-generates a master key, runs without a license,
# and falls back to LITELLM_MASTER_KEY for UI login.
# litellm_master_key = "sk-..."
# litellm_license = "lic-..."
# ui_password = "..."
# TLS: provide DNS names already pointing at the LB IP for a Google-managed
# cert. Without one, plan fails unless allow_plaintext_lb = true is set
# explicitly (trial/dev only).
# lb_domains = ["proxy.example.com"]
# allow_plaintext_lb = true
# Storage and database retention. Defaults are safe — destroy preserves
# data. Flip these only for ephemeral / CI stacks.
# cloudsql_deletion_protection = true # default: refuse destroy on the DB
# gcs_force_destroy = false # default: refuse destroy on a non-empty bucket
# Component images. Defaults pin all four to the same GHCR release tag —
# bump them together when bumping LiteLLM. To use private images, mirror
# them into Artifact Registry first — Cloud Run only authenticates against
# AR / gcr.io.
# gateway_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/gateway:1.86.0-dev"
# backend_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/backend:1.86.0-dev"
# ui_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/ui:1.86.0-dev"
# migrations_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/migrations:1.86.0-dev"
# ---------- proxy_config (mirrors helm gateway.config.proxy_config) ----------
# proxy_config = {
# model_list = [
# {
# model_name = "gpt-4o"
# litellm_params = {
# model = "openai/gpt-4o"
# api_key = "os.environ/OPENAI_API_KEY"
# }
# },
# ]
# general_settings = {
# master_key = "os.environ/LITELLM_MASTER_KEY"
# database_url = "os.environ/DATABASE_URL"
# }
# }
# ---------- Extra env / secrets ----------
# Plain-text env vars (non-sensitive). Land directly in the Cloud Run service spec.
# gateway_extra_env = {
# LANGFUSE_HOST = "https://us.cloud.langfuse.com"
# }
# Backend env vars commonly tuned in prod: SSO redirect, docs branding,
# UI admin username. UI_PASSWORD is its own first-class var (see top).
# backend_extra_env = {
# AUTO_REDIRECT_UI_LOGIN_TO_SSO = "true"
# DOCS_TITLE = "Acme LiteLLM"
# UI_USERNAME = "admin"
# }
# Provider API keys — Secret Manager resource IDs (NOT secret values). The
# Cloud Run SA auto-gains roles/secretmanager.secretAccessor on every
# secret listed here. Same shape works for backend_extra_secrets.
# gateway_extra_secrets = {
# OPENAI_API_KEY = "projects/my-gcp-project/secrets/openai-api-key"
# ANTHROPIC_API_KEY = "projects/my-gcp-project/secrets/anthropic-api-key"
# }

View File

@ -0,0 +1,405 @@
variable "project" {
description = "GCP project ID."
type = string
}
variable "region" {
description = "GCP region for VPC, Cloud SQL, Memorystore, Cloud Run, and the LB IP."
type = string
default = "us-central1"
}
variable "tenant" {
description = "Tenant slug — used as the prefix for every GCP resource the stack creates. Combined with var.env to form `<tenant>-litellm-<env>` (e.g. `acme-litellm-stage`)."
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{0,20}$", var.tenant))
error_message = "tenant must be 1-21 chars, lower-kebab-case, starting with a letter."
}
}
variable "env" {
description = "Environment suffix appended to every resource name (e.g. `stage`, `prod`, `dev`)."
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{0,8}$", var.env))
error_message = "env must be 1-9 chars, lower-kebab-case, starting with a letter."
}
}
variable "labels" {
description = "Resource labels merged into every label-supporting resource."
type = map(string)
default = {
"managed-by" = "terraform"
}
}
# ---------- Tenant-supplied secrets ----------
#
# Both default to "" so the stack stays usable for trial / OSS deploys.
# Set via TF_VAR_litellm_master_key / TF_VAR_litellm_license to keep the
# values out of state files committed to a VCS.
variable "litellm_master_key" {
description = <<-EOT
Pre-existing LITELLM_MASTER_KEY (must begin with `sk-`). When set, this
value is written to the master-key Secret Manager entry. When empty,
the stack auto-generates a random `sk-` key (preserving today's
trial-deploy behavior).
EOT
type = string
default = ""
sensitive = true
}
variable "litellm_license" {
description = <<-EOT
LiteLLM enterprise license string. When set, the stack creates a
`<tenant>-litellm-<env>-license` Secret Manager entry, grants the
runtime SA accessor on it, and exposes its value to gateway + backend
as `LITELLM_LICENSE`. Leave empty for OSS-only deploys.
EOT
type = string
default = ""
sensitive = true
}
variable "ui_password" {
description = <<-EOT
UI admin password. When set, the stack creates a
`<tenant>-litellm-<env>-ui-password` Secret Manager entry, grants the
runtime SA accessor on it, and exposes its value to the backend as
`UI_PASSWORD`. Pair with `backend_extra_env.UI_USERNAME` to set the
matching username. Leave empty to skip the proxy then falls back to
the LITELLM_MASTER_KEY for UI login.
EOT
type = string
default = ""
sensitive = true
}
# ---------- Networking ----------
variable "subnet_cidr" {
description = "Primary CIDR block for the LiteLLM subnet."
type = string
default = "10.40.0.0/16"
}
variable "vpc_connector_cidr" {
description = "CIDR for the Serverless VPC Access connector. /28 required."
type = string
default = "10.41.0.0/28"
}
# ---------- Component images ----------
#
# Cloud Run only pulls from Artifact Registry, [region.]gcr.io, or
# docker.io it rejects arbitrary registries (notably ghcr.io) at apply
# time. The four images live on GHCR upstream, so any real deploy must
# either set `image_registry` to an Artifact Registry remote repository
# pointed at ghcr.io (e.g. `us-central1-docker.pkg.dev/my-proj/litellm/berriai`)
# or override the per-component `*_image` vars individually with full URIs.
variable "image_registry" {
description = <<-EOT
Registry path prefix used to compose the four LiteLLM image URIs as
`<image_registry>/litellm-<component>:<image_tag>`. The default
(`ghcr.io/berriai`) only works on registries Cloud Run accepts for
GHCR-backed deploys, create an Artifact Registry remote repository
pointed at `https://ghcr.io` and set this to that repo's path
(e.g. `us-central1-docker.pkg.dev/<project>/<remote-repo>/berriai`).
Per-component overrides (`gateway_image`, `backend_image`, `ui_image`,
`migrations_image`) bypass this entirely when set.
EOT
type = string
default = "ghcr.io/berriai"
}
variable "image_tag" {
description = "Tag applied to all four litellm-* images when composed from `image_registry`. Bump in lockstep when bumping LiteLLM. Must match a tag actually published to GHCR — the split images use the `v`-prefixed semver convention (e.g. `v1.86.0-dev`)."
type = string
default = "v1.86.0-dev"
}
variable "gateway_image" {
description = "Full image URI for the gateway. Empty (default) composes from `image_registry` + `image_tag`. Public images or Artifact Registry only — Cloud Run won't authenticate against arbitrary private registries."
type = string
default = ""
}
variable "backend_image" {
description = "Full image URI for the backend. Empty (default) composes from `image_registry` + `image_tag`."
type = string
default = ""
}
variable "ui_image" {
description = "Full image URI for the UI. Empty (default) composes from `image_registry` + `image_tag`."
type = string
default = ""
}
variable "migrations_image" {
description = <<-EOT
Full image URI for the one-off prisma migration Cloud Run Job. Empty
(default) composes from `image_registry` + `image_tag` as
`litellm-migrations`. Built from `migrations/Dockerfile` slim image
whose ENTRYPOINT runs `python3 /app/run.py` (assembles DATABASE_URL
from DATABASE_* env vars via DatabaseURLSettings, then runs
`prisma migrate deploy`). Should track the same release tag as
gateway/backend/ui.
EOT
type = string
default = ""
}
# ---------- Service sizing ----------
variable "gateway_cpu" {
description = "Cloud Run CPU per gateway instance."
type = string
default = "1000m"
}
variable "gateway_memory" {
description = "Cloud Run memory per gateway instance."
type = string
default = "4Gi"
}
# Cloud Run autoscales out of the box (request-rate driven). The min/max
# bounds mirror the HPA replica bounds in helm/litellm/values.yaml so each
# stack scales over the same range. Cloud Run has no direct CPU-utilization
# target; the request-concurrency knob below is the closest analog.
variable "gateway_min_instances" {
description = "Lower bound on gateway Cloud Run instances. Matches helm HPA minReplicas."
type = number
default = 1
}
variable "gateway_max_instances" {
description = "Upper bound on gateway Cloud Run instances. Matches helm HPA maxReplicas."
type = number
default = 10
}
variable "gateway_max_instance_request_concurrency" {
description = "Concurrent requests one gateway instance handles before Cloud Run scales out. Cloud Run v2 default is 80; lower it for LLM streams that pin a worker for tens of seconds."
type = number
default = 80
}
variable "backend_cpu" {
description = "Cloud Run CPU per backend instance. Cloud Run rejects sub-1 CPU when `backend_max_instance_request_concurrency > 1`, so the default is 1000m. Lower this only if you also drop concurrency to 1."
type = string
default = "1000m"
}
variable "backend_memory" {
description = "Cloud Run memory per backend instance."
type = string
default = "4Gi"
}
variable "backend_min_instances" {
description = "Lower bound on backend Cloud Run instances. Matches helm HPA minReplicas."
type = number
default = 1
}
variable "backend_max_instances" {
description = "Upper bound on backend Cloud Run instances. Matches helm HPA maxReplicas."
type = number
default = 4
}
variable "backend_max_instance_request_concurrency" {
description = "Concurrent requests one backend instance handles before Cloud Run scales out."
type = number
default = 80
}
variable "ui_cpu" {
description = "Cloud Run CPU per UI instance. Cloud Run rejects sub-1 CPU when `ui_max_instance_request_concurrency > 1`, so the default is 1000m. Lower this only if you also drop concurrency to 1 (which makes nginx scale 1:1 with traffic — almost never what you want)."
type = string
default = "1000m"
}
variable "ui_memory" {
description = "Cloud Run memory per UI instance. Cloud Run rejects `< 512Mi` when CPU is always-allocated (the default whenever `ui_min_instances > 0`), so the default is 512Mi."
type = string
default = "512Mi"
}
variable "ui_min_instances" {
description = "Lower bound on UI Cloud Run instances. Matches helm HPA minReplicas."
type = number
default = 1
}
variable "ui_max_instances" {
description = "Upper bound on UI Cloud Run instances. Matches helm HPA maxReplicas."
type = number
default = 3
}
variable "ui_max_instance_request_concurrency" {
description = "Concurrent requests one UI instance handles before Cloud Run scales out. The UI is static nginx, so this can be high."
type = number
default = 200
}
# ---------- Cloud SQL ----------
variable "db_tier" {
description = "Cloud SQL tier (machine type) for the writer instance."
type = string
default = "db-custom-2-7680"
}
variable "db_edition" {
description = "Cloud SQL edition. ENTERPRISE accepts the db-custom-* and db-n1-* tiers. ENTERPRISE_PLUS only accepts db-perf-optimized-* tiers and is ~3x cost — change db_tier in lockstep when switching."
type = string
default = "ENTERPRISE"
validation {
condition = contains(["ENTERPRISE", "ENTERPRISE_PLUS"], var.db_edition)
error_message = "db_edition must be ENTERPRISE or ENTERPRISE_PLUS."
}
}
variable "db_version" {
description = "Cloud SQL Postgres version."
type = string
default = "POSTGRES_16"
}
variable "db_name" {
description = "Initial database created on the Cloud SQL instance."
type = string
default = "litellm"
}
variable "db_username" {
description = "Application Postgres user (password-auth). Password is auto-generated and stored in Secret Manager."
type = string
default = "litellm_app"
}
variable "lb_domains" {
description = <<-EOT
DNS names for a Google-managed SSL certificate fronting the LB. When
non-empty, the stack provisions a 443 forwarding rule + HTTPS target
proxy + managed cert covering these domains, and the existing 80
forwarding rule serves a permanent 301 redirect to HTTPS. Leave empty
([]) to disable TLS (must combine with `allow_plaintext_lb = true` for
the plan to succeed see README.md "TLS"). Each domain must already
resolve to the LB's anycast IP (`lb_ip` output) for managed-cert
provisioning to succeed.
EOT
type = list(string)
default = []
}
variable "allow_plaintext_lb" {
description = <<-EOT
Opt into HTTP-only mode on the load balancer (port 80, no TLS).
Default false: `terraform plan` fails when `lb_domains = []` so the
operator must either provide DNS names for a managed cert or
consciously opt out. Intended for short-lived trial / dev stacks only.
EOT
type = bool
default = false
}
variable "cloudsql_deletion_protection" {
description = "Cloud SQL instance-level deletion protection (writer + reader). Default true — `terraform destroy` (and `terraform apply` operations that replace the instance) will fail with a clear error rather than silently dropping the database. Set false only for ephemeral / CI environments."
type = bool
default = true
}
variable "gcs_force_destroy" {
description = <<-EOT
Allow `terraform destroy` to delete the GCS bucket even when it still
contains objects (request log archives, /v1/files storage, GCS cache
backend). Default false destroying a non-empty bucket fails, acting
as a tripwire against accidental data loss. Set true only for
ephemeral / CI environments. Mirrors `s3_force_destroy` on AWS and
`cloudsql_deletion_protection` on the database side.
EOT
type = bool
default = false
}
# ---------- Memorystore (Redis) ----------
variable "redis_tier" {
description = "Memorystore tier — STANDARD_HA for production, BASIC for dev."
type = string
default = "STANDARD_HA"
}
variable "redis_memory_size_gb" {
type = number
default = 1
}
# ---------- Extras / proxy_config ----------
variable "gateway_extra_env" {
description = "Plain-text env vars layered onto the gateway."
type = map(string)
default = {}
}
variable "backend_extra_env" {
description = "Plain-text env vars layered onto the backend."
type = map(string)
default = {}
}
variable "gateway_extra_secrets" {
description = <<-EOT
Extra env vars sourced from Google Secret Manager, applied to the
gateway. Map of env-var name to the Secret Manager **secret resource
ID** (`projects/<project>/secrets/<name>` *not* a version resource
ID; the Cloud Run secret_key_ref binding and the stack's IAM grant
both reject `/versions/<n>` suffixes). Versions are always resolved
as `latest`; if you need a pinned version, edit
`local.gateway_extra_secret_kv` in `cloudrun.tf` directly.
Example:
gateway_extra_secrets = {
OPENAI_API_KEY = "projects/my-proj/secrets/openai-api-key"
}
The Cloud Run service account auto-gains roles/secretmanager.secretAccessor
on each secret listed here.
EOT
type = map(string)
default = {}
}
variable "backend_extra_secrets" {
description = "Same shape as gateway_extra_secrets (secret resource ID, version always `latest`), layered onto the backend."
type = map(string)
default = {}
}
variable "proxy_config" {
description = <<-EOT
LiteLLM proxy config (contents of config.yaml). Mirrors the helm chart's
`gateway.config.proxy_config`. Passed to gateway, backend, and the
migration job as a base64-encoded env var and decoded to
/tmp/litellm-config.yaml at container start; CONFIG_FILE_PATH is set
automatically. Reference env-injected secrets from the YAML via
`os.environ/<NAME>`. Leave empty ({}) to skip.
EOT
type = any
default = {}
}

View File

@ -0,0 +1,18 @@
terraform {
required_version = ">= 1.6.0"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 6.10"
}
google-beta = {
source = "hashicorp/google-beta"
version = "~> 6.10"
}
random = {
source = "hashicorp/random"
version = "~> 3.6"
}
}
}