History

Yassin Kortam 38b28b96ff fix(terraform/gcp): abandon SQL user on destroy (#29855 ) google_sql_user.app issues DROP ROLE on destroy, which Postgres refuses because the role owns every table the migrations job created (75 objects). The previous deletion_policy=ABANDON on google_sql_database keeps the DB intact through destroy, so the role still owns its objects. Set the same policy on the user; the instance deletion takes both the database and the role with it anyway.		2026-06-06 13:42:35 -07:00
..
aws	refactor: convert AWS and GCP Terraform stacks into reusable modules … (#28103 )	2026-06-06 12:57:44 -07:00
gcp	fix(terraform/gcp): abandon SQL user on destroy (#29855 )	2026-06-06 13:42:35 -07:00
README.md	refactor: convert AWS and GCP Terraform stacks into reusable modules … (#28103 )	2026-06-06 12:57:44 -07:00

README.md

LiteLLM Terraform stacks

Two self-contained, reusable Terraform modules that deploy the componentized LiteLLM proxy — the gateway, backend, and UI as three independent containers (see helm/litellm/ for the canonical chart with the same split).

Each module declares no provider block of its own, so it can be called with count / for_each / depends_on and the caller controls region, assume-role / impersonation, aliases, and default_tags. A ready-to-run root that wires the provider lives at <stack>/examples/default/ — that's the one-command deploy path. To embed a stack in your own config, call the module by source:

module "litellm" {
  source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
  # ... inputs ...
}

Stack	Compute	Database (writer + reader)	Cache	Object store	Public entrypoint
`aws/`	ECS Fargate	Aurora Postgres (IAM auth)	ElastiCache	S3	Application LB
`gcp/`	Cloud Run	Cloud SQL Postgres (password auth)	Memorystore	GCS	External HTTPS LB

Each stack creates its own VPC and managed data stores — from <stack>/examples/default/, drop in a tfvars file and run terraform apply. Both stacks support a typed proxy_config input (mirrors helm/litellm's gateway.config.proxy_config) and per-component extra env vars / secret-manager refs.

Components

The proxy is split into three deployables:

Component	Default image	Port	Role
`gateway`	`ghcr.io/berriai/litellm-gateway:main-stable`	4000	LLM data plane (`/v1/chat/completions`, `/v1/embeddings`, …)
`backend`	`ghcr.io/berriai/litellm-backend:main-stable`	4001	Management API (`/key/`, `/user/`, `/team/`, `/model/`, …)
`ui`	`ghcr.io/berriai/litellm-ui:main-stable`	3000	Static Next.js dashboard served by nginx

The load balancer routes gateway path prefixes (mirrored verbatim from gateway/routes/allowlist.py) to the gateway, UI asset paths (/, /litellm-asset-prefix/*, /_next/*, /favicon.ico) to the UI, and everything else to the backend.

Architecture

AWS (`terraform/litellm/aws/`)

                        ┌───────────────────────────────────────┐
                        │            Public Internet            │
                        └─────────────────┬─────────────────────┘
                                          │ HTTP/80
                          ┌───────────────▼───────────────┐
                          │   Application Load Balancer   │
                          │   (path-routing listener)     │
                          └─┬─────────────┬─────────────┬─┘
                            │             │             │
            UI assets, /    │  /v1/chat,  │   /key/*    │
            /_next/*, …     │  /v1/embed, │   /user/*   │
                            │  …          │   …         │
              ┌─────────────▼───┐  ┌──────▼──────┐  ┌───▼──────────────┐
              │    ECS Service  │  │ ECS Service │  │   ECS Service    │
              │       (ui)      │  │  (gateway)  │  │    (backend)     │
              │   Fargate :3000 │  │ Fargate:4000│  │  Fargate :4001   │
              └─────────────────┘  └──────┬──────┘  └────────┬─────────┘
                                          │                  │
              ┌─── private subnets (one per AZ) ──────────────────────┐
              │                                                       │
              │   ┌────────────────────────┐    ┌────────────────┐   │
              │   │  Aurora Postgres       │    │  ElastiCache   │   │
              │   │  cluster (IAM auth)    │    │  Redis (1 node)│   │
              │   │  ┌───────┐  ┌───────┐  │    └────────────────┘   │
              │   │  │writer │  │reader │  │                         │
              │   │  └───────┘  └───────┘  │    ┌────────────────┐   │
              │   └────────────────────────┘    │  S3 bucket     │   │
              │                                  │  (versioned)   │   │
              │   ┌────────────────────────┐    └────────────────┘   │
              │   │  Secrets Manager       │                         │
              │   │  • LITELLM_MASTER_KEY  │    ┌────────────────┐   │
              │   │  • DB master password  │    │ One-off ECS    │   │
              │   │  • user-supplied API   │    │ task: prisma   │   │
              │   │    keys (referenced)   │    │ migrate deploy │   │
              │   └────────────────────────┘    └────────────────┘   │
              │                                                       │
              └─── VPC ───────────────────────────────────────────────┘
                          │ NAT gateway in one public subnet
                          ▼
                    egress to LLM providers

GCP (`terraform/litellm/gcp/`)

                        ┌───────────────────────────────────────┐
                        │            Public Internet            │
                        └─────────────────┬─────────────────────┘
                                          │ HTTP/80
                          ┌───────────────▼───────────────┐
                          │ External HTTPS Load Balancer  │
                          │   (global, URL map routing)   │
                          └─┬─────────────┬─────────────┬─┘
                            │             │             │
                            │ Serverless NEGs (one per service)
                            │             │             │
              ┌─────────────▼───┐  ┌──────▼──────┐  ┌───▼──────────────┐
              │   Cloud Run     │  │  Cloud Run  │  │    Cloud Run     │
              │      (ui)       │  │  (gateway)  │  │    (backend)     │
              │      :3000      │  │   :4000     │  │      :4001       │
              └─────────────────┘  └──────┬──────┘  └────────┬─────────┘
                                          │                  │
                                          │ Serverless VPC Access connector
              ┌─── VPC (private services access range) ──────────────────┐
              │                                                          │
              │   ┌────────────────────────┐    ┌──────────────────┐    │
              │   │  Cloud SQL Postgres    │    │  Memorystore     │    │
              │   │  ┌───────┐  ┌───────┐  │    │  Redis           │    │
              │   │  │writer │  │reader │  │    └──────────────────┘    │
              │   │  └───────┘  └───────┘  │                            │
              │   └────────────────────────┘    ┌──────────────────┐    │
              │                                  │  GCS bucket      │    │
              │   ┌────────────────────────┐    │  (versioned)     │    │
              │   │  Secret Manager        │    └──────────────────┘    │
              │   │  • LITELLM_MASTER_KEY  │                            │
              │   │  • DB password         │    ┌──────────────────┐    │
              │   │  • user-supplied API   │    │ Cloud Run Job:   │    │
              │   │    keys (referenced)   │    │ prisma migrate   │    │
              │   └────────────────────────┘    │ deploy           │    │
              │                                  └──────────────────┘    │
              └──────────────────────────────────────────────────────────┘

Images

Both stacks take per-component image references as variables. The defaults point at the public ghcr.io/berriai/litellm-<component>:main-stable images, so the stack is runnable end-to-end without pre-flight setup — pin to a specific tag for production:

AWS can pull from any registry the task execution role can reach. The role gets AmazonECSTaskExecutionRolePolicy attached, which grants ECR pull permissions for repositories in the same account.
GCP Cloud Run can only pull from Artifact Registry or gcr.io-style registries. To use images hosted elsewhere, mirror them into Artifact Registry first.

Migrations

LiteLLM's proxy runs prisma migrate deploy at startup, but on first apply the gateway/backend can race the empty database. Both stacks expose a one-off migration task that runs python litellm/proxy/prisma_migration.py against the backend image:

AWS: an aws_ecs_task_definition (litellm-migrations). Run with aws ecs run-task — the command is printed in terraform output.
GCP: a google_cloud_run_v2_job (litellm-migrations). Run with gcloud run jobs execute — the command is printed in terraform output.

Run the migration job once after the first terraform apply and before the gateway/backend services start serving traffic.

Feature parity between stacks

The two modules expose the same conceptual surface; concrete inputs differ only where the underlying cloud forces it.

Capability	AWS input(s)	GCP input(s)
Tenant + env naming	`tenant`, `env`	`tenant`, `env`
Pre-shared master key / license	`litellm_master_key`, `litellm_license`	`litellm_master_key`, `litellm_license`
UI admin password	`ui_password`	`ui_password`
Per-deployment tags / labels	`tags` (`map(string)`)	`labels` (`map(string)`)
TLS posture	`acm_certificate_arn`, `allow_plaintext_alb`	`lb_domains`, `allow_plaintext_lb`
Force destroy of object store	`s3_force_destroy`	`gcs_force_destroy`
Database deletion protection	`skip_final_snapshot`	`cloudsql_deletion_protection`
`proxy_config` (typed YAML map)	`proxy_config`	`proxy_config`
Extra plain env per component	`gateway_extra_env`, `backend_extra_env`	`gateway_extra_env`, `backend_extra_env`
Extra secret-backed env	`gateway_extra_secrets`, `backend_extra_secrets` (ARNs)	`gateway_extra_secrets`, `backend_extra_secrets` (resource IDs)
Uvicorn `--workers` on gateway	`gateway_num_workers`	`gateway_num_workers`
OpenTelemetry v2 (opt-in)	`otel_endpoint`, `otel_exporter`, `otel_environment_name`, `otel_capture_message_content`, `otel_headers_secret_arn`	`otel_endpoint`, `otel_exporter`, `otel_environment_name`, `otel_capture_message_content`, `otel_headers_secret`

Each module stamps its own stack-identity tag (litellm:stack on AWS, litellm-stack on GCP — GCP label keys forbid colons) plus managed-by = "terraform" onto every taggable / labelable resource and merges var.tags / var.labels on top. Provider default_tags on AWS merge on top of all of these.

OTel is opt-in on both clouds: leave otel_endpoint empty and nothing OTel-related is added to the container env; set it and both gateway and backend get LITELLM_OTEL_V2=true plus the full OTEL_* block, with OTEL_SERVICE_NAME stamped per component (<tenant>-litellm-<env>-gateway and -backend). Any OTEL_* key set in gateway_extra_env / backend_extra_env wins for that service.

What's not included

TLS certificates / custom domains. Both stacks expose plain-HTTP load balancers; bring your own ACM cert (AWS) or managed cert (GCP) and wire it into the LB resource.
Remote state backends. Default local state — add an s3 or gcs backend block to versions.tf when graduating to a team environment.
Observability beyond the cloud provider's defaults (CloudWatch logs on AWS, Cloud Logging on GCP). Wire your own Prometheus / Datadog / Langfuse via the *_extra_env variables, or turn on OTel v2 (see the parity table above).

HCP Terraform no-code (1-click) deploy

Both stacks are publishable as no-code modules in HCP Terraform's private registry. The end-user flow is: open the no-code launch URL, fill in a few inputs, hit Create workspace, and HCP runs plan/apply against your cloud account using a variable-set of credentials (static keys or dynamic-credentials OIDC).

Required overrides the launcher must supply per stack:

AWS (terraform/litellm/aws): region, azs, tenant, env. The image vars (gateway_image, backend_image, ui_image, migrations_image) can be left at their defaults — the GHCR images are anonymous-readable and ECS Fargate pulls them without extra credentials.
GCP (terraform/litellm/gcp): project, tenant, env, and one of:
- image_registry pointed at an Artifact Registry remote repository backed by https://ghcr.io (e.g. us-central1-docker.pkg.dev/<project>/litellm/berriai), so Cloud Run pulls the four upstream litellm-* images through it; or
- all four per-component *_image URIs pointing at images mirrored into a regular Artifact Registry repo.
The defaults (ghcr.io/berriai) cause Cloud Run admission to reject the service spec — Cloud Run only authenticates against Artifact Registry, [region.]gcr.io, or docker.io. See terraform/litellm/gcp/README.md#image-pulls for the gcloud artifacts repositories create … --mode=remote-repository command that sets up the passthrough repo (one-time, per project).

What still requires a manual step regardless of HCP no-code:

The one-off migration task. The stacks auto-run it via local-exec during terraform apply, but that requires the aws / gcloud CLI on the runner. HCP-hosted runners don't have them; use an HCP agent pool with a custom image that includes the relevant CLI, or run the command printed in the migration_run_command output by hand after the first apply.