feat: add Terraform stacks for deploying LiteLLM on AWS and GCP (#27673)
- Add AWS ECS Fargate stack with Aurora Postgres (IAM auth), ElastiCache Redis, S3, ALB with path-based routing to gateway/backend/ui components, Application Auto Scaling, and automated DB bootstrap + prisma migration via local-exec provisioners - Add GCP Cloud Run stack with Cloud SQL Postgres (password auth), Memorystore Redis, GCS, external HTTPS load balancer with serverless NEGs and URL map routing, and automated prisma migration via Cloud Run Job - Both stacks support typed proxy_config input mirroring the helm chart's gateway.config.proxy_config, per-component extra env vars, and Secret Manager references for provider API keys - Gateway/backend services depend on terraform_data.migration so they never start before the schema is in place, eliminating crash-loop windows on first apply - AWS stack uses IAM database authentication with a one-shot Fargate bootstrap task that creates and grants the rds_iam role to the application user; GCP stack uses password auth assembled at container startup to avoid Cloud SQL Auth Proxy sidecar complexity - Add .gitignore rules for Terraform state files, plan files, tfvars inputs, provider binaries, and crash logs while explicitly keeping .terraform.lock.hcl for provider version pinning - Include terraform.tfvars.example files, provider lock files, and comprehensive README documentation covering architecture, TLS setup, image pull strategies, and quick-start instructions for both stacks Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
This commit is contained in:
parent
fbe0ee81f1
commit
3d5a9ede05
19
.gitignore
vendored
19
.gitignore
vendored
@ -101,4 +101,23 @@ STABILIZATION_TODO.md
|
||||
**/*.storageState.json
|
||||
**/coverage
|
||||
test-config
|
||||
|
||||
# ---------- Terraform ----------
|
||||
# Provider binaries + module cache — regenerated by `terraform init`.
|
||||
**/.terraform/
|
||||
# State files often contain secrets (DB passwords, API keys snapshotted from
|
||||
# data sources). Keep state in a remote backend, never in git.
|
||||
*.tfstate
|
||||
*.tfstate.*
|
||||
*.tfstate.backup
|
||||
# Plan files can also contain sensitive values (variables in plaintext).
|
||||
*.tfplan
|
||||
# User-specific variable inputs — example files (terraform.tfvars.example) are
|
||||
# tracked because they end in .example, which doesn't match the glob below.
|
||||
*.tfvars
|
||||
*.auto.tfvars
|
||||
crash.log
|
||||
crash.*.log
|
||||
# .terraform.lock.hcl is intentionally NOT ignored — it pins provider versions
|
||||
# and should be committed.
|
||||
.vscode
|
||||
159
terraform/litellm/README.md
Normal file
159
terraform/litellm/README.md
Normal file
@ -0,0 +1,159 @@
|
||||
# LiteLLM Terraform stacks
|
||||
|
||||
Two self-contained Terraform root modules that deploy the **componentized**
|
||||
LiteLLM proxy — the gateway, backend, and UI as three independent containers
|
||||
(see `helm/litellm/` for the canonical chart with the same split).
|
||||
|
||||
| Stack | Compute | Database (writer + reader) | Cache | Object store | Public entrypoint |
|
||||
| ------ | ----------- | ---------------------------------- | ----------- | ------------ | ------------------ |
|
||||
| `aws/` | ECS Fargate | Aurora Postgres (IAM auth) | ElastiCache | S3 | Application LB |
|
||||
| `gcp/` | Cloud Run | Cloud SQL Postgres (password auth) | Memorystore | GCS | External HTTPS LB |
|
||||
|
||||
Each stack creates its own VPC and managed data stores — drop in a tfvars
|
||||
file and run `terraform apply`. Both stacks support a typed `proxy_config`
|
||||
input (mirrors `helm/litellm`'s `gateway.config.proxy_config`) and per-component
|
||||
extra env vars / secret-manager refs.
|
||||
|
||||
## Components
|
||||
|
||||
The proxy is split into three deployables:
|
||||
|
||||
| Component | Default image | Port | Role |
|
||||
| --------- | ---------------------------------------- | ---- | -------------------------------------------------------------------- |
|
||||
| `gateway` | `ghcr.io/berriai/litellm-gateway:main-stable` | 4000 | LLM data plane (`/v1/chat/completions`, `/v1/embeddings`, …) |
|
||||
| `backend` | `ghcr.io/berriai/litellm-backend:main-stable` | 4001 | Management API (`/key/*`, `/user/*`, `/team/*`, `/model/*`, …) |
|
||||
| `ui` | `ghcr.io/berriai/litellm-ui:main-stable` | 3000 | Static Next.js dashboard served by nginx |
|
||||
|
||||
The load balancer routes gateway path prefixes (mirrored verbatim from
|
||||
`gateway/routes/allowlist.py`) to the gateway, UI asset paths (`/`,
|
||||
`/litellm-asset-prefix/*`, `/_next/*`, `/favicon.ico`) to the UI, and
|
||||
everything else to the backend.
|
||||
|
||||
## Architecture
|
||||
|
||||
### AWS (`terraform/litellm/aws/`)
|
||||
|
||||
```
|
||||
┌───────────────────────────────────────┐
|
||||
│ Public Internet │
|
||||
└─────────────────┬─────────────────────┘
|
||||
│ HTTP/80
|
||||
┌───────────────▼───────────────┐
|
||||
│ Application Load Balancer │
|
||||
│ (path-routing listener) │
|
||||
└─┬─────────────┬─────────────┬─┘
|
||||
│ │ │
|
||||
UI assets, / │ /v1/chat, │ /key/* │
|
||||
/_next/*, … │ /v1/embed, │ /user/* │
|
||||
│ … │ … │
|
||||
┌─────────────▼───┐ ┌──────▼──────┐ ┌───▼──────────────┐
|
||||
│ ECS Service │ │ ECS Service │ │ ECS Service │
|
||||
│ (ui) │ │ (gateway) │ │ (backend) │
|
||||
│ Fargate :3000 │ │ Fargate:4000│ │ Fargate :4001 │
|
||||
└─────────────────┘ └──────┬──────┘ └────────┬─────────┘
|
||||
│ │
|
||||
┌─── private subnets (one per AZ) ──────────────────────┐
|
||||
│ │
|
||||
│ ┌────────────────────────┐ ┌────────────────┐ │
|
||||
│ │ Aurora Postgres │ │ ElastiCache │ │
|
||||
│ │ cluster (IAM auth) │ │ Redis (1 node)│ │
|
||||
│ │ ┌───────┐ ┌───────┐ │ └────────────────┘ │
|
||||
│ │ │writer │ │reader │ │ │
|
||||
│ │ └───────┘ └───────┘ │ ┌────────────────┐ │
|
||||
│ └────────────────────────┘ │ S3 bucket │ │
|
||||
│ │ (versioned) │ │
|
||||
│ ┌────────────────────────┐ └────────────────┘ │
|
||||
│ │ Secrets Manager │ │
|
||||
│ │ • LITELLM_MASTER_KEY │ ┌────────────────┐ │
|
||||
│ │ • DB master password │ │ One-off ECS │ │
|
||||
│ │ • user-supplied API │ │ task: prisma │ │
|
||||
│ │ keys (referenced) │ │ migrate deploy │ │
|
||||
│ └────────────────────────┘ └────────────────┘ │
|
||||
│ │
|
||||
└─── VPC ───────────────────────────────────────────────┘
|
||||
│ NAT gateway in one public subnet
|
||||
▼
|
||||
egress to LLM providers
|
||||
```
|
||||
|
||||
### GCP (`terraform/litellm/gcp/`)
|
||||
|
||||
```
|
||||
┌───────────────────────────────────────┐
|
||||
│ Public Internet │
|
||||
└─────────────────┬─────────────────────┘
|
||||
│ HTTP/80
|
||||
┌───────────────▼───────────────┐
|
||||
│ External HTTPS Load Balancer │
|
||||
│ (global, URL map routing) │
|
||||
└─┬─────────────┬─────────────┬─┘
|
||||
│ │ │
|
||||
│ Serverless NEGs (one per service)
|
||||
│ │ │
|
||||
┌─────────────▼───┐ ┌──────▼──────┐ ┌───▼──────────────┐
|
||||
│ Cloud Run │ │ Cloud Run │ │ Cloud Run │
|
||||
│ (ui) │ │ (gateway) │ │ (backend) │
|
||||
│ :3000 │ │ :4000 │ │ :4001 │
|
||||
└─────────────────┘ └──────┬──────┘ └────────┬─────────┘
|
||||
│ │
|
||||
│ Serverless VPC Access connector
|
||||
┌─── VPC (private services access range) ──────────────────┐
|
||||
│ │
|
||||
│ ┌────────────────────────┐ ┌──────────────────┐ │
|
||||
│ │ Cloud SQL Postgres │ │ Memorystore │ │
|
||||
│ │ ┌───────┐ ┌───────┐ │ │ Redis │ │
|
||||
│ │ │writer │ │reader │ │ └──────────────────┘ │
|
||||
│ │ └───────┘ └───────┘ │ │
|
||||
│ └────────────────────────┘ ┌──────────────────┐ │
|
||||
│ │ GCS bucket │ │
|
||||
│ ┌────────────────────────┐ │ (versioned) │ │
|
||||
│ │ Secret Manager │ └──────────────────┘ │
|
||||
│ │ • LITELLM_MASTER_KEY │ │
|
||||
│ │ • DB password │ ┌──────────────────┐ │
|
||||
│ │ • user-supplied API │ │ Cloud Run Job: │ │
|
||||
│ │ keys (referenced) │ │ prisma migrate │ │
|
||||
│ └────────────────────────┘ │ deploy │ │
|
||||
│ └──────────────────┘ │
|
||||
└──────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Images
|
||||
|
||||
Both stacks take per-component image references as variables. The defaults
|
||||
point at the public `ghcr.io/berriai/litellm-<component>:main-stable`
|
||||
images, so the stack is runnable end-to-end without pre-flight setup —
|
||||
pin to a specific tag for production:
|
||||
|
||||
- **AWS** can pull from any registry the task execution role can reach.
|
||||
The role gets `AmazonECSTaskExecutionRolePolicy` attached, which grants
|
||||
ECR pull permissions for repositories in the same account.
|
||||
|
||||
- **GCP Cloud Run** can only pull from Artifact Registry or
|
||||
`gcr.io`-style registries. To use images hosted elsewhere, mirror them
|
||||
into Artifact Registry first.
|
||||
|
||||
## Migrations
|
||||
|
||||
LiteLLM's proxy runs `prisma migrate deploy` at startup, but on first apply
|
||||
the gateway/backend can race the empty database. Both stacks expose a
|
||||
one-off migration task that runs `python litellm/proxy/prisma_migration.py`
|
||||
against the backend image:
|
||||
|
||||
- AWS: an `aws_ecs_task_definition` (`litellm-migrations`). Run with
|
||||
`aws ecs run-task` — the command is printed in `terraform output`.
|
||||
- GCP: a `google_cloud_run_v2_job` (`litellm-migrations`). Run with
|
||||
`gcloud run jobs execute` — the command is printed in `terraform output`.
|
||||
|
||||
Run the migration job once after the first `terraform apply` and before the
|
||||
gateway/backend services start serving traffic.
|
||||
|
||||
## What's not included
|
||||
|
||||
- TLS certificates / custom domains. Both stacks expose plain-HTTP load
|
||||
balancers; bring your own ACM cert (AWS) or managed cert (GCP) and wire
|
||||
it into the LB resource.
|
||||
- Remote state backends. Default local state — add an `s3` or `gcs`
|
||||
backend block to `versions.tf` when graduating to a team environment.
|
||||
- Observability beyond the cloud provider's defaults (CloudWatch logs on
|
||||
AWS, Cloud Logging on GCP). Wire your own Prometheus / Datadog / Langfuse
|
||||
via the `*_extra_env` variables.
|
||||
45
terraform/litellm/aws/.terraform.lock.hcl
generated
Normal file
45
terraform/litellm/aws/.terraform.lock.hcl
generated
Normal file
@ -0,0 +1,45 @@
|
||||
# This file is maintained automatically by "terraform init".
|
||||
# Manual edits may be lost in future updates.
|
||||
|
||||
provider "registry.terraform.io/hashicorp/aws" {
|
||||
version = "5.100.0"
|
||||
constraints = "~> 5.60"
|
||||
hashes = [
|
||||
"h1:Ijt7pOlB7Tr7maGQIqtsLFbl7pSMIj06TVdkoSBcYOw=",
|
||||
"zh:054b8dd49f0549c9a7cc27d159e45327b7b65cf404da5e5a20da154b90b8a644",
|
||||
"zh:0b97bf8d5e03d15d83cc40b0530a1f84b459354939ba6f135a0086c20ebbe6b2",
|
||||
"zh:1589a2266af699cbd5d80737a0fe02e54ec9cf2ca54e7e00ac51c7359056f274",
|
||||
"zh:6330766f1d85f01ae6ea90d1b214b8b74cc8c1badc4696b165b36ddd4cc15f7b",
|
||||
"zh:7c8c2e30d8e55291b86fcb64bdf6c25489d538688545eb48fd74ad622e5d3862",
|
||||
"zh:99b1003bd9bd32ee323544da897148f46a527f622dc3971af63ea3e251596342",
|
||||
"zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425",
|
||||
"zh:9f8b909d3ec50ade83c8062290378b1ec553edef6a447c56dadc01a99f4eaa93",
|
||||
"zh:aaef921ff9aabaf8b1869a86d692ebd24fbd4e12c21205034bb679b9caf883a2",
|
||||
"zh:ac882313207aba00dd5a76dbd572a0ddc818bb9cbf5c9d61b28fe30efaec951e",
|
||||
"zh:bb64e8aff37becab373a1a0cc1080990785304141af42ed6aa3dd4913b000421",
|
||||
"zh:dfe495f6621df5540d9c92ad40b8067376350b005c637ea6efac5dc15028add4",
|
||||
"zh:f0ddf0eaf052766cfe09dea8200a946519f653c384ab4336e2a4a64fdd6310e9",
|
||||
"zh:f1b7e684f4c7ae1eed272b6de7d2049bb87a0275cb04dbb7cda6636f600699c9",
|
||||
"zh:ff461571e3f233699bf690db319dfe46aec75e58726636a0d97dd9ac6e32fb70",
|
||||
]
|
||||
}
|
||||
|
||||
provider "registry.terraform.io/hashicorp/random" {
|
||||
version = "3.8.1"
|
||||
constraints = "~> 3.6"
|
||||
hashes = [
|
||||
"h1:u8AKlWVDTH5r9YLSeswoVEjiY72Rt4/ch7U+61ZDkiQ=",
|
||||
"zh:08dd03b918c7b55713026037c5400c48af5b9f468f483463321bd18e17b907b4",
|
||||
"zh:0eee654a5542dc1d41920bbf2419032d6f0d5625b03bd81339e5b33394a3e0ae",
|
||||
"zh:229665ddf060aa0ed315597908483eee5b818a17d09b6417a0f52fd9405c4f57",
|
||||
"zh:2469d2e48f28076254a2a3fc327f184914566d9e40c5780b8d96ebf7205f8bc0",
|
||||
"zh:37d7eb334d9561f335e748280f5535a384a88675af9a9eac439d4cfd663bcb66",
|
||||
"zh:741101426a2f2c52dee37122f0f4a2f2d6af6d852cb1db634480a86398fa3511",
|
||||
"zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
|
||||
"zh:a902473f08ef8df62cfe6116bd6c157070a93f66622384300de235a533e9d4a9",
|
||||
"zh:b85c511a23e57a2147355932b3b6dce2a11e856b941165793a0c3d7578d94d05",
|
||||
"zh:c5172226d18eaac95b1daac80172287b69d4ce32750c82ad77fa0768be4ea4b8",
|
||||
"zh:dab4434dba34aad569b0bc243c2d3f3ff86dd7740def373f2a49816bd2ff819b",
|
||||
"zh:f49fd62aa8c5525a5c17abd51e27ca5e213881d58882fd42fec4a545b53c9699",
|
||||
]
|
||||
}
|
||||
254
terraform/litellm/aws/README.md
Normal file
254
terraform/litellm/aws/README.md
Normal file
@ -0,0 +1,254 @@
|
||||
# LiteLLM on AWS (ECS Fargate)
|
||||
|
||||
Deploys the componentized LiteLLM proxy on AWS:
|
||||
|
||||
- **VPC** with public + private subnets across the AZs you pass in, one NAT gateway
|
||||
- **Aurora Postgres** cluster — one writer instance + one reader instance, **IAM database authentication enabled**
|
||||
- **ElastiCache Redis** (private, replication group with multi-AZ failover and at-rest + in-transit encryption) for caching + rate limiting
|
||||
- **S3 bucket** (private, versioned, SSE-S3) — exposed to gateway + backend as `S3_BUCKET_NAME` / `S3_REGION_NAME` for cache backend, request log archival, and `/v1/files` storage
|
||||
- **Secrets Manager** entries for `LITELLM_MASTER_KEY` (auto-generated, `sk-…`) and the Aurora master password (bootstrap-only)
|
||||
- **ECS Fargate cluster** running three services — `gateway`, `backend`, `ui`
|
||||
- **Application Load Balancer** (public, HTTP/80) with path-based routing:
|
||||
- LLM data-plane prefixes (`/v1/chat/*`, `/v1/embeddings`, …) → `gateway`
|
||||
- UI assets (`/`, `/_next/*`, `/litellm-asset-prefix/*`, …) → `ui`
|
||||
- Everything else (management API: `/key/*`, `/user/*`, …) → `backend`
|
||||
- **One-off migration task** (`litellm-migrations`) that runs `prisma migrate deploy` from the dedicated `ghcr.io/berriai/litellm-migrations` image
|
||||
|
||||
## Aurora + IAM auth
|
||||
|
||||
The cluster runs with `iam_database_authentication_enabled = true`. Enabling
|
||||
that on the cluster doesn't by itself let any Postgres user log in with an IAM
|
||||
token — you also need to `CREATE USER ... GRANT rds_iam` once. `bootstrap.tf`
|
||||
does this automatically during `terraform apply` via a one-shot Fargate task
|
||||
(`postgres:16-alpine` running the bootstrap SQL with the master password from
|
||||
Secrets Manager). The SQL is idempotent, so re-applies are safe.
|
||||
|
||||
The same apply also runs the prisma schema migration via the existing
|
||||
`litellm-migrations` task definition, and the gateway/backend services
|
||||
`depends_on` the migration so they don't start until the schema is in place.
|
||||
|
||||
At runtime, the proxy assembles `DATABASE_URL` from `DATABASE_HOST/PORT/USER/NAME`
|
||||
plus a short-lived IAM token — see `litellm/proxy/auth/rds_iam_token.py`. The
|
||||
task role has `rds-db:connect` scoped to the IAM-authed user on the cluster.
|
||||
|
||||
**Break-glass.** If you need to run the bootstrap or migration by hand (e.g.,
|
||||
to re-apply against an externally provisioned cluster), `db_bootstrap_sql` and
|
||||
`migration_run_command` are still exposed as outputs.
|
||||
|
||||
**Prerequisite.** `terraform apply` shells out to `aws ecs run-task` /
|
||||
`aws ecs wait` in `local-exec` provisioners, so the machine running terraform
|
||||
needs the `aws` CLI installed and authenticated.
|
||||
|
||||
## Configuring the proxy
|
||||
|
||||
### `proxy_config` (preferred)
|
||||
|
||||
Mirrors the helm chart's `gateway.config.proxy_config`. The map is YAML-encoded
|
||||
and base64-passed to gateway, backend, and the migration task; each container
|
||||
decodes it to `/tmp/litellm-config.yaml` at startup and sets `CONFIG_FILE_PATH`
|
||||
to match.
|
||||
|
||||
```hcl
|
||||
proxy_config = {
|
||||
model_list = [
|
||||
{
|
||||
model_name = "gpt-4o"
|
||||
litellm_params = {
|
||||
model = "openai/gpt-4o"
|
||||
api_key = "os.environ/OPENAI_API_KEY"
|
||||
}
|
||||
},
|
||||
]
|
||||
general_settings = {
|
||||
master_key = "os.environ/LITELLM_MASTER_KEY"
|
||||
database_url = "os.environ/DATABASE_URL"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
LiteLLM resolves `os.environ/<NAME>` references in the YAML against the
|
||||
container's environment. That means provider API keys belong in
|
||||
`*_extra_secrets` (next section), and your YAML just references them by name.
|
||||
|
||||
### Extra env vars
|
||||
|
||||
Non-sensitive plaintext (feature flags, observability hosts, etc.):
|
||||
|
||||
```hcl
|
||||
gateway_extra_env = {
|
||||
LANGFUSE_HOST = "https://us.cloud.langfuse.com"
|
||||
}
|
||||
backend_extra_env = {
|
||||
STORE_MODEL_IN_DB = "True"
|
||||
}
|
||||
```
|
||||
|
||||
### Extra secrets (API keys)
|
||||
|
||||
Sensitive values — provider API keys, third-party tokens — live in **existing
|
||||
Secrets Manager secrets**. Reference them by ARN:
|
||||
|
||||
```hcl
|
||||
gateway_extra_secrets = {
|
||||
OPENAI_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:openai-api-key-AbCdEf"
|
||||
ANTHROPIC_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:anthropic-api-key-GhIjKl"
|
||||
}
|
||||
```
|
||||
|
||||
What happens under the hood:
|
||||
- The execution role auto-gains `secretsmanager:GetSecretValue` on every ARN
|
||||
listed here.
|
||||
- ECS resolves each secret at task launch and injects its value into the
|
||||
container as the env var named on the left.
|
||||
- The `proxy_config` YAML references the resulting env var via
|
||||
`os.environ/OPENAI_API_KEY`.
|
||||
|
||||
To pluck a single field out of a JSON secret, use ECS's `:fieldName::` suffix:
|
||||
|
||||
```hcl
|
||||
gateway_extra_secrets = {
|
||||
OPENAI_API_KEY = "arn:…:secret:provider-keys-AbCdEf:openai_api_key::"
|
||||
}
|
||||
```
|
||||
|
||||
To create the secret beforehand:
|
||||
|
||||
```bash
|
||||
aws secretsmanager create-secret \
|
||||
--name openai-api-key \
|
||||
--secret-string "sk-proj-..."
|
||||
```
|
||||
|
||||
## Tenant deployment
|
||||
|
||||
Every resource the stack creates is named `${tenant}-litellm-${env}` (or
|
||||
that plus a per-resource suffix), so multiple tenants and multiple
|
||||
environments coexist in the same account as long as the `(tenant, env)`
|
||||
pair differs:
|
||||
|
||||
| `tenant` | `env` | Example resource name |
|
||||
| -------- | ------- | ---------------------------------- |
|
||||
| `acme` | `stage` | `acme-litellm-stage-gateway` |
|
||||
| `acme` | `prod` | `acme-litellm-prod-master-key` |
|
||||
| `globex` | `dev` | `globex-litellm-dev-license` |
|
||||
|
||||
For a per-tenant instance, the only inputs that change are the tenant
|
||||
slug, env, and the two pre-issued secrets:
|
||||
|
||||
```bash
|
||||
export TF_VAR_litellm_master_key="sk-..." # the tenant's master key
|
||||
export TF_VAR_litellm_license="lic-..." # their LITELLM_LICENSE
|
||||
|
||||
terraform apply \
|
||||
-var "region=us-west-2" \
|
||||
-var 'azs=["us-west-2a","us-west-2b"]' \
|
||||
-var "tenant=acme" \
|
||||
-var "env=stage"
|
||||
```
|
||||
|
||||
Both `litellm_master_key` and `litellm_license` are optional:
|
||||
- Omit `litellm_master_key` → the stack auto-generates a random `sk-…`
|
||||
value (trial/dev path).
|
||||
- Omit `litellm_license` → no license secret is created and gateway/
|
||||
backend run without `LITELLM_LICENSE` (OSS-only).
|
||||
|
||||
Use `TF_VAR_*` env vars rather than tfvars files for these — values
|
||||
written to a tfvars file end up in `terraform.tfstate` and any committed
|
||||
example files.
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
cd terraform/litellm/aws
|
||||
cp terraform.tfvars.example terraform.tfvars
|
||||
# Edit: region, tenant, env, azs, *_image, proxy_config, gateway_extra_secrets.
|
||||
|
||||
terraform init
|
||||
terraform apply
|
||||
```
|
||||
|
||||
That single apply provisions everything, runs the DB user bootstrap, runs the
|
||||
schema migration, and only then starts the gateway/backend services. When it
|
||||
returns, the stack is serving traffic.
|
||||
|
||||
```bash
|
||||
terraform output alb_url
|
||||
# UI login: admin / <master key>
|
||||
aws secretsmanager get-secret-value \
|
||||
--secret-id "$(terraform output -raw master_key_secret_arn)" \
|
||||
--query SecretString --output text
|
||||
```
|
||||
|
||||
## Image pulls
|
||||
|
||||
The defaults pull from `ghcr.io/berriai/litellm-<component>:v1.86.0-dev`,
|
||||
which is anonymous-readable. There are four images: `litellm-gateway`,
|
||||
`litellm-backend`, `litellm-ui`, and `litellm-migrations` (slim image used
|
||||
only by the one-off migration task — runs `prisma migrate deploy` against
|
||||
the writer DB and exits). Bump them together when bumping LiteLLM. To pull
|
||||
from a private registry:
|
||||
|
||||
- **ECR (same account)**: the execution role already has
|
||||
`AmazonECSTaskExecutionRolePolicy`, which grants ECR pull for repos in
|
||||
the same account. No extra config needed.
|
||||
- **ECR (cross-account)**: attach a policy to the execution role allowing
|
||||
`ecr:GetAuthorizationToken` + `ecr:BatchGetImage` on the foreign repo
|
||||
ARNs.
|
||||
- **Other private registries** (GHCR with a PAT, Docker Hub, …): create a
|
||||
secret holding `{"auths":{"<registry>":{"auth":"<base64-user:token>"}}}`
|
||||
in Secrets Manager and set `repositoryCredentials.credentialsParameter`
|
||||
on the task def container — extend `ecs.tf` accordingly.
|
||||
|
||||
## TLS
|
||||
|
||||
`terraform plan` refuses to provision an HTTP-only ALB by default — TLS
|
||||
is the supported posture. Two paths:
|
||||
|
||||
**Production / staging — provide an ACM certificate:**
|
||||
|
||||
1. Create or import an ACM cert in `var.region` covering the DNS name you
|
||||
plan to point at the ALB.
|
||||
2. Set `acm_certificate_arn = "arn:aws:acm:..."` in tfvars and apply.
|
||||
|
||||
Result: a 443 listener carries the path-routing rules; the 80 listener
|
||||
serves a permanent 301 redirect to HTTPS, so HTTP clients are
|
||||
automatically upgraded.
|
||||
|
||||
**Trial / dev — explicitly opt into HTTP-only:**
|
||||
|
||||
Set `allow_plaintext_alb = true` in tfvars. Without this flag, plan fails
|
||||
with a clear error pointing at the precondition. Intended for short-lived
|
||||
trial / dev stacks only.
|
||||
|
||||
## Storage and database retention
|
||||
|
||||
Three opt-in tripwires guard against accidental data loss on
|
||||
`terraform destroy`:
|
||||
|
||||
- **`skip_final_snapshot`** (Aurora; default `false`) — destroying the
|
||||
cluster takes a `<cluster>-final-<short-sha>` snapshot first.
|
||||
- **`s3_force_destroy`** (S3 bucket holding request log archives,
|
||||
`/v1/files` content, and the S3 cache backend; default `false`) —
|
||||
`terraform destroy` against a non-empty bucket fails.
|
||||
|
||||
Flip either to `true` only for ephemeral / CI stacks where you accept
|
||||
losing the contents.
|
||||
|
||||
## Files
|
||||
|
||||
| File | What's in it |
|
||||
| ----------------- | --------------------------------------------------------------------- |
|
||||
| `versions.tf` | Terraform + provider version constraints |
|
||||
| `providers.tf` | AWS provider (region + default tags) |
|
||||
| `variables.tf` | All input variables |
|
||||
| `locals.tf` | Path-prefix lists for ALB routing (mirror of `helm/.../ingress.yaml`) |
|
||||
| `network.tf` | VPC, subnets, IGW, NAT, route tables, security groups |
|
||||
| `secrets.tf` | Secrets Manager entries + random passwords |
|
||||
| `rds.tf` | Aurora Postgres cluster + writer / reader instances |
|
||||
| `redis.tf` | ElastiCache Redis |
|
||||
| `s3.tf` | S3 bucket + task-role policy scoped to it |
|
||||
| `iam.tf` | Task execution + task roles, including `rds-db:connect` |
|
||||
| `ecs.tf` | ECS cluster, task definitions, services for the three components |
|
||||
| `alb.tf` | ALB, listener, target groups, path-routing rules |
|
||||
| `migrations.tf` | One-off migration task definition |
|
||||
| `outputs.tf` | DNS name, secret ARN, bootstrap SQL, migration `run-task` command |
|
||||
179
terraform/litellm/aws/alb.tf
Normal file
179
terraform/litellm/aws/alb.tf
Normal file
@ -0,0 +1,179 @@
|
||||
resource "aws_lb" "this" {
|
||||
name = local.name
|
||||
load_balancer_type = "application"
|
||||
internal = false
|
||||
security_groups = [aws_security_group.alb.id]
|
||||
subnets = aws_subnet.public[*].id
|
||||
|
||||
idle_timeout = 120
|
||||
}
|
||||
|
||||
locals {
|
||||
# When an ACM cert ARN is provided we provision a 443 listener carrying
|
||||
# the path-routing rules and downgrade the 80 listener to a redirect.
|
||||
tls_enabled = var.acm_certificate_arn != ""
|
||||
rules_listener_arn = local.tls_enabled ? aws_lb_listener.https[0].arn : aws_lb_listener.http.arn
|
||||
}
|
||||
|
||||
# Target groups — one per component. IP target type because Fargate tasks
|
||||
# are addressed by ENI IP, not instance.
|
||||
|
||||
resource "aws_lb_target_group" "gateway" {
|
||||
name = "${local.name}-gateway"
|
||||
port = 4000
|
||||
protocol = "HTTP"
|
||||
target_type = "ip"
|
||||
vpc_id = aws_vpc.this.id
|
||||
|
||||
health_check {
|
||||
path = "/health/readiness"
|
||||
matcher = "200-299"
|
||||
interval = 30
|
||||
timeout = 10
|
||||
healthy_threshold = 2
|
||||
unhealthy_threshold = 3
|
||||
}
|
||||
|
||||
deregistration_delay = 30
|
||||
}
|
||||
|
||||
resource "aws_lb_target_group" "backend" {
|
||||
name = "${local.name}-backend"
|
||||
port = 4001
|
||||
protocol = "HTTP"
|
||||
target_type = "ip"
|
||||
vpc_id = aws_vpc.this.id
|
||||
|
||||
health_check {
|
||||
path = "/health/readiness"
|
||||
matcher = "200-299"
|
||||
interval = 30
|
||||
timeout = 10
|
||||
healthy_threshold = 2
|
||||
unhealthy_threshold = 3
|
||||
}
|
||||
|
||||
deregistration_delay = 30
|
||||
}
|
||||
|
||||
resource "aws_lb_target_group" "ui" {
|
||||
name = "${local.name}-ui"
|
||||
port = 3000
|
||||
protocol = "HTTP"
|
||||
target_type = "ip"
|
||||
vpc_id = aws_vpc.this.id
|
||||
|
||||
health_check {
|
||||
path = "/healthz"
|
||||
matcher = "200-299"
|
||||
interval = 30
|
||||
timeout = 5
|
||||
healthy_threshold = 2
|
||||
unhealthy_threshold = 3
|
||||
}
|
||||
|
||||
deregistration_delay = 30
|
||||
}
|
||||
|
||||
# HTTP listener. When TLS is enabled this only serves a permanent
|
||||
# 301 redirect to HTTPS; otherwise it carries the path-routing rules
|
||||
# (default → backend).
|
||||
resource "aws_lb_listener" "http" {
|
||||
load_balancer_arn = aws_lb.this.arn
|
||||
port = 80
|
||||
protocol = "HTTP"
|
||||
|
||||
default_action {
|
||||
type = local.tls_enabled ? "redirect" : "forward"
|
||||
|
||||
dynamic "redirect" {
|
||||
for_each = local.tls_enabled ? [1] : []
|
||||
content {
|
||||
port = "443"
|
||||
protocol = "HTTPS"
|
||||
status_code = "HTTP_301"
|
||||
}
|
||||
}
|
||||
|
||||
target_group_arn = local.tls_enabled ? null : aws_lb_target_group.backend.arn
|
||||
}
|
||||
|
||||
# Default-deny on the HTTP-only path: TLS is the supported posture.
|
||||
# Operators must either supply an ACM cert or explicitly opt in.
|
||||
lifecycle {
|
||||
precondition {
|
||||
condition = local.tls_enabled || var.allow_plaintext_alb
|
||||
error_message = "ALB has no HTTPS listener. Either set `acm_certificate_arn` to enable TLS, or set `allow_plaintext_alb = true` to opt into HTTP-only (trial / dev only)."
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# HTTPS listener. Only created when an ACM cert ARN is supplied — terminates
|
||||
# TLS and carries the same default + path-routing rules.
|
||||
resource "aws_lb_listener" "https" {
|
||||
count = local.tls_enabled ? 1 : 0
|
||||
load_balancer_arn = aws_lb.this.arn
|
||||
port = 443
|
||||
protocol = "HTTPS"
|
||||
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
|
||||
certificate_arn = var.acm_certificate_arn
|
||||
|
||||
default_action {
|
||||
type = "forward"
|
||||
target_group_arn = aws_lb_target_group.backend.arn
|
||||
}
|
||||
}
|
||||
|
||||
# UI exact paths (/, /favicon.ico, /ui) — priority 10.
|
||||
resource "aws_lb_listener_rule" "ui_exact" {
|
||||
listener_arn = local.rules_listener_arn
|
||||
priority = 10
|
||||
|
||||
action {
|
||||
type = "forward"
|
||||
target_group_arn = aws_lb_target_group.ui.arn
|
||||
}
|
||||
|
||||
condition {
|
||||
path_pattern {
|
||||
values = local.ui_exact_paths
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# UI prefix paths (/_next/*, /litellm-asset-prefix/*, /assets/*, /ui/*) — priority 20.
|
||||
resource "aws_lb_listener_rule" "ui_prefix" {
|
||||
listener_arn = local.rules_listener_arn
|
||||
priority = 20
|
||||
|
||||
action {
|
||||
type = "forward"
|
||||
target_group_arn = aws_lb_target_group.ui.arn
|
||||
}
|
||||
|
||||
condition {
|
||||
path_pattern {
|
||||
values = local.ui_path_prefixes
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Gateway prefix rules — one per chunk-of-5 because ALB caps a path-pattern
|
||||
# condition at 5 values. Priorities 100..(100 + N).
|
||||
resource "aws_lb_listener_rule" "gateway" {
|
||||
for_each = { for idx, chunk in local.gateway_path_chunks : idx => chunk }
|
||||
|
||||
listener_arn = local.rules_listener_arn
|
||||
priority = 100 + tonumber(each.key)
|
||||
|
||||
action {
|
||||
type = "forward"
|
||||
target_group_arn = aws_lb_target_group.gateway.arn
|
||||
}
|
||||
|
||||
condition {
|
||||
path_pattern {
|
||||
values = each.value
|
||||
}
|
||||
}
|
||||
}
|
||||
105
terraform/litellm/aws/autoscaling.tf
Normal file
105
terraform/litellm/aws/autoscaling.tf
Normal file
@ -0,0 +1,105 @@
|
||||
# Application Auto Scaling for the three ECS services. Mirrors the HPA values
|
||||
# baked into the helm chart at helm/litellm/values.yaml:
|
||||
#
|
||||
# gateway: 1-10 replicas, target 70% CPU + 80% memory
|
||||
# backend: 1-4 replicas, target 70% CPU
|
||||
# ui: 1-3 replicas, target 80% CPU (off by default; nginx static export)
|
||||
#
|
||||
# Each service gets a scalable target plus one target-tracking policy per metric.
|
||||
# When autoscaling is disabled (count=0) the resources collapse cleanly out of
|
||||
# the plan; the service's desired_count from ecs.tf stays in effect.
|
||||
|
||||
# ---------- Gateway ----------
|
||||
resource "aws_appautoscaling_target" "gateway" {
|
||||
count = var.gateway_autoscaling_enabled ? 1 : 0
|
||||
service_namespace = "ecs"
|
||||
resource_id = "service/${aws_ecs_cluster.this.name}/${aws_ecs_service.gateway.name}"
|
||||
scalable_dimension = "ecs:service:DesiredCount"
|
||||
min_capacity = var.gateway_min_capacity
|
||||
max_capacity = var.gateway_max_capacity
|
||||
}
|
||||
|
||||
resource "aws_appautoscaling_policy" "gateway_cpu" {
|
||||
count = var.gateway_autoscaling_enabled ? 1 : 0
|
||||
name = "${local.name}-gateway-cpu"
|
||||
policy_type = "TargetTrackingScaling"
|
||||
service_namespace = aws_appautoscaling_target.gateway[0].service_namespace
|
||||
resource_id = aws_appautoscaling_target.gateway[0].resource_id
|
||||
scalable_dimension = aws_appautoscaling_target.gateway[0].scalable_dimension
|
||||
|
||||
target_tracking_scaling_policy_configuration {
|
||||
predefined_metric_specification {
|
||||
predefined_metric_type = "ECSServiceAverageCPUUtilization"
|
||||
}
|
||||
target_value = var.gateway_cpu_target
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_appautoscaling_policy" "gateway_memory" {
|
||||
# Memory policy is optional; set gateway_memory_target = 0 to omit it.
|
||||
count = var.gateway_autoscaling_enabled && var.gateway_memory_target > 0 ? 1 : 0
|
||||
name = "${local.name}-gateway-memory"
|
||||
policy_type = "TargetTrackingScaling"
|
||||
service_namespace = aws_appautoscaling_target.gateway[0].service_namespace
|
||||
resource_id = aws_appautoscaling_target.gateway[0].resource_id
|
||||
scalable_dimension = aws_appautoscaling_target.gateway[0].scalable_dimension
|
||||
|
||||
target_tracking_scaling_policy_configuration {
|
||||
predefined_metric_specification {
|
||||
predefined_metric_type = "ECSServiceAverageMemoryUtilization"
|
||||
}
|
||||
target_value = var.gateway_memory_target
|
||||
}
|
||||
}
|
||||
|
||||
# ---------- Backend ----------
|
||||
resource "aws_appautoscaling_target" "backend" {
|
||||
count = var.backend_autoscaling_enabled ? 1 : 0
|
||||
service_namespace = "ecs"
|
||||
resource_id = "service/${aws_ecs_cluster.this.name}/${aws_ecs_service.backend.name}"
|
||||
scalable_dimension = "ecs:service:DesiredCount"
|
||||
min_capacity = var.backend_min_capacity
|
||||
max_capacity = var.backend_max_capacity
|
||||
}
|
||||
|
||||
resource "aws_appautoscaling_policy" "backend_cpu" {
|
||||
count = var.backend_autoscaling_enabled ? 1 : 0
|
||||
name = "${local.name}-backend-cpu"
|
||||
policy_type = "TargetTrackingScaling"
|
||||
service_namespace = aws_appautoscaling_target.backend[0].service_namespace
|
||||
resource_id = aws_appautoscaling_target.backend[0].resource_id
|
||||
scalable_dimension = aws_appautoscaling_target.backend[0].scalable_dimension
|
||||
|
||||
target_tracking_scaling_policy_configuration {
|
||||
predefined_metric_specification {
|
||||
predefined_metric_type = "ECSServiceAverageCPUUtilization"
|
||||
}
|
||||
target_value = var.backend_cpu_target
|
||||
}
|
||||
}
|
||||
|
||||
# ---------- UI ----------
|
||||
resource "aws_appautoscaling_target" "ui" {
|
||||
count = var.ui_autoscaling_enabled ? 1 : 0
|
||||
service_namespace = "ecs"
|
||||
resource_id = "service/${aws_ecs_cluster.this.name}/${aws_ecs_service.ui.name}"
|
||||
scalable_dimension = "ecs:service:DesiredCount"
|
||||
min_capacity = var.ui_min_capacity
|
||||
max_capacity = var.ui_max_capacity
|
||||
}
|
||||
|
||||
resource "aws_appautoscaling_policy" "ui_cpu" {
|
||||
count = var.ui_autoscaling_enabled ? 1 : 0
|
||||
name = "${local.name}-ui-cpu"
|
||||
policy_type = "TargetTrackingScaling"
|
||||
service_namespace = aws_appautoscaling_target.ui[0].service_namespace
|
||||
resource_id = aws_appautoscaling_target.ui[0].resource_id
|
||||
scalable_dimension = aws_appautoscaling_target.ui[0].scalable_dimension
|
||||
|
||||
target_tracking_scaling_policy_configuration {
|
||||
predefined_metric_specification {
|
||||
predefined_metric_type = "ECSServiceAverageCPUUtilization"
|
||||
}
|
||||
target_value = var.ui_cpu_target
|
||||
}
|
||||
}
|
||||
185
terraform/litellm/aws/bootstrap.tf
Normal file
185
terraform/litellm/aws/bootstrap.tf
Normal file
@ -0,0 +1,185 @@
|
||||
# Auto-runs the two manual steps that used to follow `terraform apply`:
|
||||
#
|
||||
# 1. Create the IAM-authed Postgres user (litellm_app) — uses the postgres:16
|
||||
# image with the master password from Secrets Manager.
|
||||
# 2. Run prisma migrate deploy — reuses the existing aws_ecs_task_definition
|
||||
# .migrations task def from migrations.tf.
|
||||
#
|
||||
# Both are invoked via `terraform_data` provisioners. Gateway/backend services
|
||||
# in ecs.tf depend on `terraform_data.migration`, so on a fresh apply they
|
||||
# don't start until the schema is in place — no crash-loop window.
|
||||
#
|
||||
# Triggers:
|
||||
# - bootstrap_db re-runs if the Aurora cluster is recreated, or if the
|
||||
# bootstrap task definition (image/SQL) changes.
|
||||
# - migration re-runs if the migration task def revision changes (e.g., new
|
||||
# backend image with new prisma migration files) or if bootstrap re-ran.
|
||||
#
|
||||
# Requires `aws` CLI on the machine running terraform. For laptop usage that's
|
||||
# fine; for CI/CD the runner image needs `aws`.
|
||||
|
||||
# ---------- IAM ----------
|
||||
# Execution role can already read the runtime secrets (master_key, user-provided
|
||||
# extras — see iam.tf). The DB master password lives in a separate secret used
|
||||
# only here, so we grant access in an additive policy.
|
||||
resource "aws_iam_policy" "bootstrap_secrets" {
|
||||
name = "${local.name}-bootstrap-secrets-access"
|
||||
policy = jsonencode({
|
||||
Version = "2012-10-17"
|
||||
Statement = [{
|
||||
Effect = "Allow"
|
||||
Action = ["secretsmanager:GetSecretValue"]
|
||||
Resource = [aws_secretsmanager_secret.db_master_password.arn]
|
||||
}]
|
||||
})
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "task_execution_bootstrap_secrets" {
|
||||
role = aws_iam_role.task_execution.name
|
||||
policy_arn = aws_iam_policy.bootstrap_secrets.arn
|
||||
}
|
||||
|
||||
# ---------- Bootstrap task def ----------
|
||||
resource "aws_cloudwatch_log_group" "bootstrap_db" {
|
||||
name = "/ecs/${local.name}/bootstrap-db"
|
||||
retention_in_days = var.log_retention_days
|
||||
}
|
||||
|
||||
locals {
|
||||
# Idempotent: CREATE USER is wrapped in DO/EXCEPTION; GRANTs are
|
||||
# idempotent by definition (re-granting is a no-op). Safe to re-run on
|
||||
# any subsequent apply.
|
||||
bootstrap_sql = <<-SQL
|
||||
DO $$
|
||||
BEGIN
|
||||
CREATE USER ${var.db_username};
|
||||
EXCEPTION WHEN duplicate_object THEN NULL;
|
||||
END $$;
|
||||
GRANT rds_iam TO ${var.db_username};
|
||||
GRANT ALL PRIVILEGES ON DATABASE ${var.db_name} TO ${var.db_username};
|
||||
GRANT ALL ON SCHEMA public TO ${var.db_username};
|
||||
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO ${var.db_username};
|
||||
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO ${var.db_username};
|
||||
SQL
|
||||
}
|
||||
|
||||
resource "aws_ecs_task_definition" "bootstrap_db" {
|
||||
family = "${local.name}-bootstrap-db"
|
||||
network_mode = "awsvpc"
|
||||
requires_compatibilities = ["FARGATE"]
|
||||
cpu = 256
|
||||
memory = 512
|
||||
execution_role_arn = aws_iam_role.task_execution.arn
|
||||
task_role_arn = aws_iam_role.task.arn
|
||||
|
||||
container_definitions = jsonencode([{
|
||||
name = "psql"
|
||||
image = "postgres:16-alpine"
|
||||
essential = true
|
||||
|
||||
environment = [
|
||||
{ name = "PGHOST", value = aws_rds_cluster.this.endpoint },
|
||||
{ name = "PGPORT", value = tostring(aws_rds_cluster.this.port) },
|
||||
{ name = "PGUSER", value = var.db_master_username },
|
||||
{ name = "PGDATABASE", value = var.db_name },
|
||||
{ name = "BOOTSTRAP_SQL", value = local.bootstrap_sql },
|
||||
]
|
||||
secrets = [
|
||||
# `:password::` extracts the password field out of the JSON secret.
|
||||
{ name = "PGPASSWORD", valueFrom = "${aws_secretsmanager_secret.db_master_password.arn}:password::" },
|
||||
]
|
||||
|
||||
entryPoint = ["sh", "-c"]
|
||||
command = ["echo \"$BOOTSTRAP_SQL\" | psql -v ON_ERROR_STOP=1"]
|
||||
|
||||
logConfiguration = {
|
||||
logDriver = "awslogs"
|
||||
options = {
|
||||
awslogs-group = aws_cloudwatch_log_group.bootstrap_db.name
|
||||
awslogs-region = var.region
|
||||
awslogs-stream-prefix = "bootstrap"
|
||||
}
|
||||
}
|
||||
}])
|
||||
}
|
||||
|
||||
# ---------- Bootstrap trigger ----------
|
||||
resource "terraform_data" "bootstrap_db" {
|
||||
triggers_replace = {
|
||||
cluster_resource_id = aws_rds_cluster.this.cluster_resource_id
|
||||
task_def_revision = aws_ecs_task_definition.bootstrap_db.revision
|
||||
}
|
||||
|
||||
provisioner "local-exec" {
|
||||
interpreter = ["bash", "-c"]
|
||||
environment = {
|
||||
CLUSTER = aws_ecs_cluster.this.name
|
||||
TASK_DEF = aws_ecs_task_definition.bootstrap_db.arn
|
||||
SUBNETS = join(",", aws_subnet.private[*].id)
|
||||
SG = aws_security_group.tasks.id
|
||||
REGION = var.region
|
||||
LOG_GRP = aws_cloudwatch_log_group.bootstrap_db.name
|
||||
}
|
||||
command = <<-EOT
|
||||
set -euo pipefail
|
||||
task_arn=$(aws ecs run-task --region "$REGION" --cluster "$CLUSTER" \
|
||||
--launch-type FARGATE --task-definition "$TASK_DEF" \
|
||||
--network-configuration "awsvpcConfiguration={subnets=[$SUBNETS],securityGroups=[$SG],assignPublicIp=DISABLED}" \
|
||||
--query 'tasks[0].taskArn' --output text)
|
||||
echo "bootstrap task: $task_arn"
|
||||
aws ecs wait tasks-stopped --region "$REGION" --cluster "$CLUSTER" --tasks "$task_arn"
|
||||
task_id=$(echo "$task_arn" | awk -F/ '{print $NF}')
|
||||
exit_code=$(aws ecs describe-tasks --region "$REGION" --cluster "$CLUSTER" --tasks "$task_id" \
|
||||
--query 'tasks[0].containers[0].exitCode' --output text)
|
||||
if [ "$exit_code" != "0" ]; then
|
||||
echo "Bootstrap failed (exit=$exit_code). Logs: $LOG_GRP" >&2
|
||||
exit 1
|
||||
fi
|
||||
EOT
|
||||
}
|
||||
|
||||
depends_on = [
|
||||
aws_rds_cluster_instance.writer,
|
||||
aws_iam_role_policy_attachment.task_execution_bootstrap_secrets,
|
||||
]
|
||||
}
|
||||
|
||||
# ---------- Migration trigger ----------
|
||||
# Reuses the task definition from migrations.tf — this resource just invokes
|
||||
# it and waits.
|
||||
resource "terraform_data" "migration" {
|
||||
triggers_replace = {
|
||||
task_def_revision = aws_ecs_task_definition.migrations.revision
|
||||
bootstrap_id = terraform_data.bootstrap_db.id
|
||||
}
|
||||
|
||||
provisioner "local-exec" {
|
||||
interpreter = ["bash", "-c"]
|
||||
environment = {
|
||||
CLUSTER = aws_ecs_cluster.this.name
|
||||
TASK_DEF = aws_ecs_task_definition.migrations.arn
|
||||
SUBNETS = join(",", aws_subnet.private[*].id)
|
||||
SG = aws_security_group.tasks.id
|
||||
REGION = var.region
|
||||
LOG_GRP = aws_cloudwatch_log_group.migrations.name
|
||||
}
|
||||
command = <<-EOT
|
||||
set -euo pipefail
|
||||
task_arn=$(aws ecs run-task --region "$REGION" --cluster "$CLUSTER" \
|
||||
--launch-type FARGATE --task-definition "$TASK_DEF" \
|
||||
--network-configuration "awsvpcConfiguration={subnets=[$SUBNETS],securityGroups=[$SG],assignPublicIp=DISABLED}" \
|
||||
--query 'tasks[0].taskArn' --output text)
|
||||
echo "migration task: $task_arn"
|
||||
aws ecs wait tasks-stopped --region "$REGION" --cluster "$CLUSTER" --tasks "$task_arn"
|
||||
task_id=$(echo "$task_arn" | awk -F/ '{print $NF}')
|
||||
exit_code=$(aws ecs describe-tasks --region "$REGION" --cluster "$CLUSTER" --tasks "$task_id" \
|
||||
--query 'tasks[0].containers[0].exitCode' --output text)
|
||||
if [ "$exit_code" != "0" ]; then
|
||||
echo "Migration failed (exit=$exit_code). Logs: $LOG_GRP" >&2
|
||||
exit 1
|
||||
fi
|
||||
EOT
|
||||
}
|
||||
|
||||
depends_on = [terraform_data.bootstrap_db]
|
||||
}
|
||||
347
terraform/litellm/aws/ecs.tf
Normal file
347
terraform/litellm/aws/ecs.tf
Normal file
@ -0,0 +1,347 @@
|
||||
resource "aws_ecs_cluster" "this" {
|
||||
name = local.name
|
||||
|
||||
setting {
|
||||
name = "containerInsights"
|
||||
value = "enabled"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_log_group" "gateway" {
|
||||
name = "/ecs/${local.name}/gateway"
|
||||
retention_in_days = var.log_retention_days
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_log_group" "backend" {
|
||||
name = "/ecs/${local.name}/backend"
|
||||
retention_in_days = var.log_retention_days
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_log_group" "ui" {
|
||||
name = "/ecs/${local.name}/ui"
|
||||
retention_in_days = var.log_retention_days
|
||||
}
|
||||
|
||||
resource "aws_cloudwatch_log_group" "migrations" {
|
||||
name = "/ecs/${local.name}/migrations"
|
||||
retention_in_days = var.log_retention_days
|
||||
}
|
||||
|
||||
# Shared env block fed to gateway, backend, and the migration task. Mirrors
|
||||
# the helm chart's `litellm.serverEnv` helper on the IAM-auth branch:
|
||||
# DATABASE_URL is assembled at runtime by
|
||||
# litellm/proxy/auth/rds_iam_token.py::init_iam_db_url_from_env from
|
||||
# HOST/PORT/USER/NAME plus an IAM-signed token, so no DB password is needed
|
||||
# in the task definition.
|
||||
locals {
|
||||
shared_env = [
|
||||
{ name = "IAM_TOKEN_DB_AUTH", value = "true" },
|
||||
{ name = "DATABASE_HOST", value = aws_rds_cluster.this.endpoint },
|
||||
{ name = "DATABASE_PORT", value = tostring(aws_rds_cluster.this.port) },
|
||||
{ name = "DATABASE_USER", value = var.db_username },
|
||||
{ name = "DATABASE_NAME", value = var.db_name },
|
||||
{ name = "DATABASE_HOST_READ_REPLICA", value = aws_rds_cluster.this.reader_endpoint },
|
||||
{ name = "DATABASE_PORT_READ_REPLICA", value = tostring(aws_rds_cluster.this.port) },
|
||||
{ name = "REDIS_HOST", value = aws_elasticache_replication_group.this.primary_endpoint_address },
|
||||
{ name = "REDIS_PORT", value = tostring(aws_elasticache_replication_group.this.port) },
|
||||
# transit_encryption_enabled = true on the replication group means the
|
||||
# proxy must connect via rediss://. _redis.get_redis_url_from_environment
|
||||
# honors REDIS_SSL to flip the scheme.
|
||||
{ name = "REDIS_SSL", value = "true" },
|
||||
# S3 bucket — referenced from proxy_config via os.environ/S3_BUCKET_NAME
|
||||
# (e.g. cache backend, request log archival, /files passthrough).
|
||||
{ name = "S3_BUCKET_NAME", value = aws_s3_bucket.this.bucket },
|
||||
{ name = "S3_REGION_NAME", value = var.region },
|
||||
# boto3 inside generate_iam_auth_token reads AWS_REGION_NAME first, then
|
||||
# AWS_REGION. Set both for compatibility.
|
||||
{ name = "AWS_REGION", value = var.region },
|
||||
{ name = "AWS_REGION_NAME", value = var.region },
|
||||
]
|
||||
|
||||
shared_secrets = concat(
|
||||
[
|
||||
{ name = "LITELLM_MASTER_KEY", valueFrom = aws_secretsmanager_secret.master_key.arn },
|
||||
],
|
||||
var.litellm_license == "" ? [] : [
|
||||
{ name = "LITELLM_LICENSE", valueFrom = aws_secretsmanager_secret.license[0].arn },
|
||||
],
|
||||
)
|
||||
|
||||
# Backend-only managed secrets. UI_PASSWORD is consumed by the management
|
||||
# API (UI login flow) and has no use on the gateway data plane.
|
||||
backend_managed_secrets = var.ui_password == "" ? [] : [
|
||||
{ name = "UI_PASSWORD", valueFrom = aws_secretsmanager_secret.ui_password[0].arn },
|
||||
]
|
||||
|
||||
gateway_extra_env_list = [
|
||||
for k, v in var.gateway_extra_env : { name = k, value = v }
|
||||
]
|
||||
backend_extra_env_list = [
|
||||
for k, v in var.backend_extra_env : { name = k, value = v }
|
||||
]
|
||||
|
||||
backend_default_env = [
|
||||
{ name = "STORE_MODEL_IN_DB", value = "true" },
|
||||
]
|
||||
gateway_extra_secrets_list = [
|
||||
for k, v in var.gateway_extra_secrets : { name = k, valueFrom = v }
|
||||
]
|
||||
backend_extra_secrets_list = [
|
||||
for k, v in var.backend_extra_secrets : { name = k, valueFrom = v }
|
||||
]
|
||||
|
||||
# Mirrors the helm chart's gateway.config.create / configmap pattern.
|
||||
# ECS Fargate has no ConfigMap analogue, so we pass the YAML as a
|
||||
# base64-encoded env var and decode it at container start via a tiny
|
||||
# python shim that prepends the image's normal uvicorn entrypoint.
|
||||
proxy_config_enabled = length(keys(var.proxy_config)) > 0
|
||||
proxy_config_b64 = local.proxy_config_enabled ? base64encode(yamlencode(var.proxy_config)) : ""
|
||||
|
||||
proxy_config_env = local.proxy_config_enabled ? [
|
||||
{ name = "LITELLM_PROXY_CONFIG_B64", value = local.proxy_config_b64 },
|
||||
{ name = "CONFIG_FILE_PATH", value = "/tmp/litellm-config.yaml" },
|
||||
] : []
|
||||
|
||||
# Gateway always needs --workers wired in (no NUM_WORKERS env var support
|
||||
# in the image entrypoint). When proxy_config is enabled we also have to
|
||||
# decode the base64 config first, so the command goes through `sh -c`;
|
||||
# otherwise we keep the image's ENTRYPOINT and only override `command`.
|
||||
gateway_uvicorn_args = "--host 0.0.0.0 --port 4000 --workers ${var.gateway_num_workers}"
|
||||
backend_uvicorn_args = "--host 0.0.0.0 --port 4001"
|
||||
|
||||
gateway_proxy_overrides = local.proxy_config_enabled ? {
|
||||
entryPoint = ["sh", "-c"]
|
||||
command = [
|
||||
"python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\" && exec uvicorn gateway.main:app ${local.gateway_uvicorn_args}"
|
||||
]
|
||||
} : {
|
||||
# Mirror the image's ENTRYPOINT so we can append --workers via command.
|
||||
entryPoint = ["uvicorn", "gateway.main:app"]
|
||||
command = split(" ", local.gateway_uvicorn_args)
|
||||
}
|
||||
|
||||
backend_proxy_overrides = local.proxy_config_enabled ? {
|
||||
entryPoint = ["sh", "-c"]
|
||||
command = [
|
||||
"python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\" && exec uvicorn backend.main:app ${local.backend_uvicorn_args}"
|
||||
]
|
||||
} : {}
|
||||
}
|
||||
|
||||
# ---------- Gateway ----------
|
||||
resource "aws_ecs_task_definition" "gateway" {
|
||||
family = "${local.name}-gateway"
|
||||
network_mode = "awsvpc"
|
||||
requires_compatibilities = ["FARGATE"]
|
||||
cpu = var.gateway_cpu
|
||||
memory = var.gateway_memory
|
||||
execution_role_arn = aws_iam_role.task_execution.arn
|
||||
task_role_arn = aws_iam_role.task.arn
|
||||
|
||||
container_definitions = jsonencode([
|
||||
merge(
|
||||
{
|
||||
name = "gateway"
|
||||
image = var.gateway_image
|
||||
essential = true
|
||||
|
||||
portMappings = [{ containerPort = 4000, protocol = "tcp" }]
|
||||
environment = concat(
|
||||
local.shared_env,
|
||||
local.gateway_extra_env_list,
|
||||
local.proxy_config_env,
|
||||
)
|
||||
secrets = concat(local.shared_secrets, local.gateway_extra_secrets_list)
|
||||
|
||||
# Container-level healthCheck intentionally omitted — the wolfi
|
||||
# runtime image doesn't ship curl/wget. The ALB target group polls
|
||||
# /health/readiness.
|
||||
|
||||
logConfiguration = {
|
||||
logDriver = "awslogs"
|
||||
options = {
|
||||
awslogs-group = aws_cloudwatch_log_group.gateway.name
|
||||
awslogs-region = var.region
|
||||
awslogs-stream-prefix = "gateway"
|
||||
}
|
||||
}
|
||||
},
|
||||
local.gateway_proxy_overrides,
|
||||
)
|
||||
])
|
||||
}
|
||||
|
||||
resource "aws_ecs_service" "gateway" {
|
||||
name = "${local.name}-gateway"
|
||||
cluster = aws_ecs_cluster.this.id
|
||||
task_definition = aws_ecs_task_definition.gateway.arn
|
||||
desired_count = var.gateway_desired_count
|
||||
launch_type = "FARGATE"
|
||||
|
||||
network_configuration {
|
||||
subnets = aws_subnet.private[*].id
|
||||
security_groups = [aws_security_group.tasks.id]
|
||||
assign_public_ip = false
|
||||
}
|
||||
|
||||
load_balancer {
|
||||
target_group_arn = aws_lb_target_group.gateway.arn
|
||||
container_name = "gateway"
|
||||
container_port = 4000
|
||||
}
|
||||
|
||||
deployment_minimum_healthy_percent = 50
|
||||
deployment_maximum_percent = 200
|
||||
|
||||
# desired_count is owned by Application Auto Scaling once enabled (autoscaling.tf).
|
||||
# Terraform sets the initial value from var.gateway_desired_count, then steps aside.
|
||||
lifecycle {
|
||||
ignore_changes = [desired_count]
|
||||
}
|
||||
|
||||
# Don't start until the schema migration has run. Otherwise the proxy
|
||||
# boots, Prisma fails on the missing tables, and ECS thrashes the task.
|
||||
depends_on = [
|
||||
aws_lb_listener.http,
|
||||
aws_lb_listener.https,
|
||||
terraform_data.migration,
|
||||
]
|
||||
}
|
||||
|
||||
# ---------- Backend ----------
|
||||
resource "aws_ecs_task_definition" "backend" {
|
||||
family = "${local.name}-backend"
|
||||
network_mode = "awsvpc"
|
||||
requires_compatibilities = ["FARGATE"]
|
||||
cpu = var.backend_cpu
|
||||
memory = var.backend_memory
|
||||
execution_role_arn = aws_iam_role.task_execution.arn
|
||||
task_role_arn = aws_iam_role.task.arn
|
||||
|
||||
container_definitions = jsonencode([
|
||||
merge(
|
||||
{
|
||||
name = "backend"
|
||||
image = var.backend_image
|
||||
essential = true
|
||||
|
||||
portMappings = [{ containerPort = 4001, protocol = "tcp" }]
|
||||
environment = concat(
|
||||
local.shared_env,
|
||||
local.backend_default_env,
|
||||
local.backend_extra_env_list,
|
||||
local.proxy_config_env,
|
||||
)
|
||||
secrets = concat(local.shared_secrets, local.backend_managed_secrets, local.backend_extra_secrets_list)
|
||||
|
||||
logConfiguration = {
|
||||
logDriver = "awslogs"
|
||||
options = {
|
||||
awslogs-group = aws_cloudwatch_log_group.backend.name
|
||||
awslogs-region = var.region
|
||||
awslogs-stream-prefix = "backend"
|
||||
}
|
||||
}
|
||||
},
|
||||
local.backend_proxy_overrides,
|
||||
)
|
||||
])
|
||||
}
|
||||
|
||||
resource "aws_ecs_service" "backend" {
|
||||
name = "${local.name}-backend"
|
||||
cluster = aws_ecs_cluster.this.id
|
||||
task_definition = aws_ecs_task_definition.backend.arn
|
||||
desired_count = var.backend_desired_count
|
||||
launch_type = "FARGATE"
|
||||
|
||||
network_configuration {
|
||||
subnets = aws_subnet.private[*].id
|
||||
security_groups = [aws_security_group.tasks.id]
|
||||
assign_public_ip = false
|
||||
}
|
||||
|
||||
load_balancer {
|
||||
target_group_arn = aws_lb_target_group.backend.arn
|
||||
container_name = "backend"
|
||||
container_port = 4001
|
||||
}
|
||||
|
||||
deployment_minimum_healthy_percent = 50
|
||||
deployment_maximum_percent = 200
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [desired_count]
|
||||
}
|
||||
|
||||
depends_on = [
|
||||
aws_lb_listener.http,
|
||||
aws_lb_listener.https,
|
||||
terraform_data.migration,
|
||||
]
|
||||
}
|
||||
|
||||
# ---------- UI ----------
|
||||
# task_role is deliberately the unprivileged ui_task — the UI has no DB,
|
||||
# S3, or Secrets Manager dependency, and inheriting the shared `task`
|
||||
# role would expose every data-plane secret to a compromised UI
|
||||
# container via the task metadata endpoint.
|
||||
resource "aws_ecs_task_definition" "ui" {
|
||||
family = "${local.name}-ui"
|
||||
network_mode = "awsvpc"
|
||||
requires_compatibilities = ["FARGATE"]
|
||||
cpu = var.ui_cpu
|
||||
memory = var.ui_memory
|
||||
execution_role_arn = aws_iam_role.task_execution.arn
|
||||
task_role_arn = aws_iam_role.ui_task.arn
|
||||
|
||||
container_definitions = jsonencode([
|
||||
{
|
||||
name = "ui"
|
||||
image = var.ui_image
|
||||
essential = true
|
||||
portMappings = [{ containerPort = 3000, protocol = "tcp" }]
|
||||
|
||||
logConfiguration = {
|
||||
logDriver = "awslogs"
|
||||
options = {
|
||||
awslogs-group = aws_cloudwatch_log_group.ui.name
|
||||
awslogs-region = var.region
|
||||
awslogs-stream-prefix = "ui"
|
||||
}
|
||||
}
|
||||
}
|
||||
])
|
||||
}
|
||||
|
||||
resource "aws_ecs_service" "ui" {
|
||||
name = "${local.name}-ui"
|
||||
cluster = aws_ecs_cluster.this.id
|
||||
task_definition = aws_ecs_task_definition.ui.arn
|
||||
desired_count = var.ui_desired_count
|
||||
launch_type = "FARGATE"
|
||||
|
||||
network_configuration {
|
||||
subnets = aws_subnet.private[*].id
|
||||
security_groups = [aws_security_group.tasks.id]
|
||||
assign_public_ip = false
|
||||
}
|
||||
|
||||
load_balancer {
|
||||
target_group_arn = aws_lb_target_group.ui.arn
|
||||
container_name = "ui"
|
||||
container_port = 3000
|
||||
}
|
||||
|
||||
deployment_minimum_healthy_percent = 50
|
||||
deployment_maximum_percent = 200
|
||||
|
||||
lifecycle {
|
||||
ignore_changes = [desired_count]
|
||||
}
|
||||
|
||||
depends_on = [
|
||||
aws_lb_listener.http,
|
||||
aws_lb_listener.https,
|
||||
]
|
||||
}
|
||||
114
terraform/litellm/aws/iam.tf
Normal file
114
terraform/litellm/aws/iam.tf
Normal file
@ -0,0 +1,114 @@
|
||||
# ECS task execution role — used by the agent to pull images, write logs,
|
||||
# and resolve secrets at task start.
|
||||
data "aws_iam_policy_document" "task_assume" {
|
||||
statement {
|
||||
actions = ["sts:AssumeRole"]
|
||||
principals {
|
||||
type = "Service"
|
||||
identifiers = ["ecs-tasks.amazonaws.com"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_iam_role" "task_execution" {
|
||||
name = "${local.name}-task-execution"
|
||||
assume_role_policy = data.aws_iam_policy_document.task_assume.json
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "task_execution" {
|
||||
role = aws_iam_role.task_execution.name
|
||||
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
|
||||
}
|
||||
|
||||
# User-provided extra secrets may be passed as the bare secret ARN
|
||||
# ("arn:aws:secretsmanager:...:secret:name-AbCdEf") or the JSON-key form
|
||||
# ECS supports — fully spelled out as
|
||||
# "arn:...:secret:name-AbCdEf:jsonKey:versionStage:versionId" with any of
|
||||
# the trailing parts blank ("...:jsonKey::" being the most common). The IAM
|
||||
# policy resource must always be the bare ARN, so we split on ':' and keep
|
||||
# the first 7 components — robust to any combination of empty/non-empty
|
||||
# version-stage/version-id suffixes that a regex would otherwise have to
|
||||
# enumerate.
|
||||
locals {
|
||||
extra_secret_value_froms = concat(
|
||||
values(var.gateway_extra_secrets),
|
||||
values(var.backend_extra_secrets),
|
||||
)
|
||||
|
||||
extra_secret_arns = distinct([
|
||||
for v in local.extra_secret_value_froms :
|
||||
join(":", slice(split(":", v), 0, 7))
|
||||
])
|
||||
}
|
||||
|
||||
# Execution role can read the managed secrets + any caller-provided extras
|
||||
# so ECS can resolve them when launching tasks. Image pulls inherit the
|
||||
# managed AmazonECSTaskExecutionRolePolicy.
|
||||
data "aws_iam_policy_document" "secrets_access" {
|
||||
statement {
|
||||
actions = ["secretsmanager:GetSecretValue"]
|
||||
resources = concat(
|
||||
[aws_secretsmanager_secret.master_key.arn],
|
||||
aws_secretsmanager_secret.license[*].arn,
|
||||
aws_secretsmanager_secret.ui_password[*].arn,
|
||||
local.extra_secret_arns,
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_iam_policy" "secrets_access" {
|
||||
name = "${local.name}-secrets-access"
|
||||
policy = data.aws_iam_policy_document.secrets_access.json
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "task_execution_secrets" {
|
||||
role = aws_iam_role.task_execution.name
|
||||
policy_arn = aws_iam_policy.secrets_access.arn
|
||||
}
|
||||
|
||||
# ---------- Task role ----------
|
||||
#
|
||||
# Assumed by the running container. Gets `rds-db:connect` so the proxy can
|
||||
# mint IAM-signed Postgres tokens for the app user. Layer additional
|
||||
# policies here (e.g. Bedrock invoke, S3 read) when the proxy needs them.
|
||||
|
||||
resource "aws_iam_role" "task" {
|
||||
name = "${local.name}-task"
|
||||
assume_role_policy = data.aws_iam_policy_document.task_assume.json
|
||||
}
|
||||
|
||||
data "aws_caller_identity" "current" {}
|
||||
|
||||
data "aws_iam_policy_document" "rds_iam_connect" {
|
||||
statement {
|
||||
actions = ["rds-db:connect"]
|
||||
resources = [
|
||||
"arn:aws:rds-db:${var.region}:${data.aws_caller_identity.current.account_id}:dbuser:${aws_rds_cluster.this.cluster_resource_id}/${var.db_username}",
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_iam_policy" "rds_iam_connect" {
|
||||
name = "${local.name}-rds-iam-connect"
|
||||
policy = data.aws_iam_policy_document.rds_iam_connect.json
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "task_rds_iam_connect" {
|
||||
role = aws_iam_role.task.name
|
||||
policy_arn = aws_iam_policy.rds_iam_connect.arn
|
||||
}
|
||||
|
||||
# ---------- UI task role ----------
|
||||
#
|
||||
# The UI is static nginx with no DB, S3, or Secrets Manager dependencies,
|
||||
# so it deliberately does NOT inherit the shared `task` role's
|
||||
# rds-db:connect / S3 / extra-secrets policies. Empty policy set — the
|
||||
# only thing exposed via the task metadata endpoint is an identity that
|
||||
# can't reach any LiteLLM data-plane resource. The shared
|
||||
# `task_execution` role still pulls the image and writes logs (its
|
||||
# credentials aren't surfaced to the container).
|
||||
|
||||
resource "aws_iam_role" "ui_task" {
|
||||
name = "${local.name}-ui-task"
|
||||
assume_role_policy = data.aws_iam_policy_document.task_assume.json
|
||||
}
|
||||
75
terraform/litellm/aws/locals.tf
Normal file
75
terraform/litellm/aws/locals.tf
Normal file
@ -0,0 +1,75 @@
|
||||
# Gateway path prefixes — mirrored verbatim from gateway/routes/allowlist.py
|
||||
# and the helm ingress in helm/litellm/templates/ingress.yaml. Anything not in
|
||||
# this list and not a UI asset path falls through to the backend (management
|
||||
# API) catch-all rule on the ALB.
|
||||
#
|
||||
# ALB listener rules cap path-pattern conditions at 5 values per rule, so we
|
||||
# chunk this list and emit one rule per chunk.
|
||||
locals {
|
||||
# Every resource the stack creates is named `<tenant>-litellm-<env>`
|
||||
# (or that with a per-resource suffix). Computed once here so the rest of
|
||||
# the stack can reference local.name.
|
||||
name = "${var.tenant}-litellm-${var.env}"
|
||||
|
||||
gateway_path_prefixes = [
|
||||
"/v1/chat/*", "/chat/*",
|
||||
"/v1/completions*", "/completions*",
|
||||
"/v1/embeddings*", "/embeddings*",
|
||||
"/v1/moderations*", "/moderations*",
|
||||
"/v1/audio/*", "/audio/*",
|
||||
"/v1/images/*", "/images/*",
|
||||
"/v1/files*", "/files*",
|
||||
"/v1/batches*", "/batches*",
|
||||
"/v1/fine_tuning/*", "/fine_tuning/*",
|
||||
"/v1/fine-tuning/*", "/fine-tuning/*",
|
||||
"/v1/responses*", "/responses*",
|
||||
"/v1/threads*", "/threads*",
|
||||
"/v1/assistants*", "/assistants*",
|
||||
"/v1/vector_stores*", "/vector_stores*",
|
||||
"/v1/indexes*",
|
||||
"/v1/models*", "/models*",
|
||||
"/openai/*", "/engines/*",
|
||||
"/v1/messages*", "/messages*",
|
||||
"/v1/skills/*", "/v1/a2a/*",
|
||||
"/v1/rerank*", "/v2/rerank*", "/rerank*",
|
||||
"/v1/ocr*", "/ocr*",
|
||||
"/v1/rag/*", "/rag/*",
|
||||
"/v1/video/*", "/v1/videos/*", "/video/*", "/videos/*",
|
||||
"/v1/search*", "/search*",
|
||||
"/v1/containers/*", "/containers/*",
|
||||
"/v1/evals/*",
|
||||
"/v1/memory/*",
|
||||
"/queue/chat/*",
|
||||
"/v1beta/*",
|
||||
"/interactions/*",
|
||||
"/anthropic/*", "/azure/*", "/azure_ai/*", "/aws/*", "/bedrock/*",
|
||||
"/cohere/*", "/gemini/*", "/google/*",
|
||||
"/vertex_ai/*", "/vertex-ai/*",
|
||||
"/assemblyai/*", "/eu.assemblyai/*",
|
||||
"/langfuse/*", "/vllm/*",
|
||||
"/mistral/*", "/groq/*", "/voyage/*", "/cursor/*", "/milvus/*",
|
||||
"/openai_passthrough/*",
|
||||
"/toolset/*",
|
||||
"/v1/realtime*", "/realtime*",
|
||||
"/health*", "/metrics", "/test*",
|
||||
]
|
||||
|
||||
# Static UI asset prefixes — handled by the UI service, not the backend
|
||||
# catch-all. /favicon.ico and / are also UI but added as exact rules.
|
||||
ui_path_prefixes = [
|
||||
"/litellm-asset-prefix/*",
|
||||
"/_next/*",
|
||||
"/assets/*",
|
||||
"/ui/*",
|
||||
]
|
||||
|
||||
ui_exact_paths = [
|
||||
"/",
|
||||
"/favicon.ico",
|
||||
"/ui",
|
||||
]
|
||||
|
||||
# ALB rules accept ≤ 5 path-pattern values per condition. Chunk the prefix
|
||||
# list so each chunk becomes one rule.
|
||||
gateway_path_chunks = chunklist(local.gateway_path_prefixes, 5)
|
||||
}
|
||||
45
terraform/litellm/aws/migrations.tf
Normal file
45
terraform/litellm/aws/migrations.tf
Normal file
@ -0,0 +1,45 @@
|
||||
# Task definition for the dedicated litellm-migrations image. Mirrors the
|
||||
# pre-install/pre-upgrade Helm hook in helm/litellm/templates/migrations-job.yaml.
|
||||
#
|
||||
# The image (built from migrations/Dockerfile) ships with
|
||||
# `ENTRYPOINT ["python3", "/app/run.py"]`. run.py assembles DATABASE_URL from
|
||||
# the discrete DATABASE_* env vars (IAM auth here) via DatabaseURLSettings,
|
||||
# then calls ProxyExtrasDBManager.setup_database() — i.e. `prisma migrate
|
||||
# deploy` with the v2 resolver and P3005/P3009/P3018 recovery. It does NOT
|
||||
# read CONFIG_FILE_PATH, the master key, or DISABLE_SCHEMA_UPDATE, so we
|
||||
# don't pass them.
|
||||
#
|
||||
# Invoked automatically by `terraform_data.migration` in bootstrap.tf during
|
||||
# every apply (after the IAM-authed user has been created). The
|
||||
# `migration_run_command` output is preserved for break-glass manual re-runs.
|
||||
resource "aws_ecs_task_definition" "migrations" {
|
||||
family = "${local.name}-migrations"
|
||||
network_mode = "awsvpc"
|
||||
requires_compatibilities = ["FARGATE"]
|
||||
# Prisma's Node + Rust engine plus the v2 migration resolver routinely
|
||||
# peaks well above 1 GiB while applying the schema. 4 GiB gives plenty
|
||||
# of headroom; CPU stays low because `prisma migrate deploy` is
|
||||
# single-threaded.
|
||||
cpu = 512
|
||||
memory = 4096
|
||||
execution_role_arn = aws_iam_role.task_execution.arn
|
||||
task_role_arn = aws_iam_role.task.arn
|
||||
|
||||
container_definitions = jsonencode([{
|
||||
name = "migrations"
|
||||
image = var.migrations_image
|
||||
essential = true
|
||||
|
||||
# No entryPoint/command override — the image's ENTRYPOINT runs run.py.
|
||||
environment = local.shared_env
|
||||
|
||||
logConfiguration = {
|
||||
logDriver = "awslogs"
|
||||
options = {
|
||||
awslogs-group = aws_cloudwatch_log_group.migrations.name
|
||||
awslogs-region = var.region
|
||||
awslogs-stream-prefix = "migrations"
|
||||
}
|
||||
}
|
||||
}])
|
||||
}
|
||||
172
terraform/litellm/aws/network.tf
Normal file
172
terraform/litellm/aws/network.tf
Normal file
@ -0,0 +1,172 @@
|
||||
data "aws_availability_zones" "available" {
|
||||
state = "available"
|
||||
}
|
||||
|
||||
resource "aws_vpc" "this" {
|
||||
cidr_block = var.vpc_cidr
|
||||
enable_dns_hostnames = true
|
||||
enable_dns_support = true
|
||||
|
||||
tags = { Name = local.name }
|
||||
}
|
||||
|
||||
resource "aws_internet_gateway" "this" {
|
||||
vpc_id = aws_vpc.this.id
|
||||
tags = { Name = local.name }
|
||||
}
|
||||
|
||||
# Public subnets (ALB + NAT). One per AZ.
|
||||
resource "aws_subnet" "public" {
|
||||
count = length(var.azs)
|
||||
vpc_id = aws_vpc.this.id
|
||||
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
|
||||
availability_zone = var.azs[count.index]
|
||||
map_public_ip_on_launch = true
|
||||
|
||||
tags = { Name = "${local.name}-public-${var.azs[count.index]}" }
|
||||
}
|
||||
|
||||
# Private subnets (ECS tasks, RDS, ElastiCache). One per AZ, separate from
|
||||
# public range.
|
||||
resource "aws_subnet" "private" {
|
||||
count = length(var.azs)
|
||||
vpc_id = aws_vpc.this.id
|
||||
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
|
||||
availability_zone = var.azs[count.index]
|
||||
|
||||
tags = { Name = "${local.name}-private-${var.azs[count.index]}" }
|
||||
}
|
||||
|
||||
resource "aws_eip" "nat" {
|
||||
domain = "vpc"
|
||||
tags = { Name = "${local.name}-nat" }
|
||||
|
||||
depends_on = [aws_internet_gateway.this]
|
||||
}
|
||||
|
||||
# Single NAT gateway in the first public subnet. For HA, replicate per AZ —
|
||||
# adds ~$30/mo per gateway, so off by default for a baseline deployment.
|
||||
resource "aws_nat_gateway" "this" {
|
||||
allocation_id = aws_eip.nat.id
|
||||
subnet_id = aws_subnet.public[0].id
|
||||
|
||||
tags = { Name = local.name }
|
||||
|
||||
depends_on = [aws_internet_gateway.this]
|
||||
}
|
||||
|
||||
resource "aws_route_table" "public" {
|
||||
vpc_id = aws_vpc.this.id
|
||||
|
||||
route {
|
||||
cidr_block = "0.0.0.0/0"
|
||||
gateway_id = aws_internet_gateway.this.id
|
||||
}
|
||||
|
||||
tags = { Name = "${local.name}-public" }
|
||||
}
|
||||
|
||||
resource "aws_route_table_association" "public" {
|
||||
count = length(var.azs)
|
||||
subnet_id = aws_subnet.public[count.index].id
|
||||
route_table_id = aws_route_table.public.id
|
||||
}
|
||||
|
||||
resource "aws_route_table" "private" {
|
||||
vpc_id = aws_vpc.this.id
|
||||
|
||||
route {
|
||||
cidr_block = "0.0.0.0/0"
|
||||
nat_gateway_id = aws_nat_gateway.this.id
|
||||
}
|
||||
|
||||
tags = { Name = "${local.name}-private" }
|
||||
}
|
||||
|
||||
resource "aws_route_table_association" "private" {
|
||||
count = length(var.azs)
|
||||
subnet_id = aws_subnet.private[count.index].id
|
||||
route_table_id = aws_route_table.private.id
|
||||
}
|
||||
|
||||
# ---------- Security groups ----------
|
||||
|
||||
resource "aws_security_group" "alb" {
|
||||
name = "${local.name}-alb"
|
||||
description = "Inbound HTTP/HTTPS to the LiteLLM ALB."
|
||||
vpc_id = aws_vpc.this.id
|
||||
|
||||
ingress {
|
||||
description = "HTTP from anywhere"
|
||||
from_port = 80
|
||||
to_port = 80
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
ingress {
|
||||
description = "HTTPS from anywhere"
|
||||
from_port = 443
|
||||
to_port = 443
|
||||
protocol = "tcp"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
|
||||
egress {
|
||||
description = "All egress"
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
protocol = "-1"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_security_group" "tasks" {
|
||||
name = "${local.name}-tasks"
|
||||
description = "ECS tasks (gateway/backend/ui)."
|
||||
vpc_id = aws_vpc.this.id
|
||||
|
||||
ingress {
|
||||
description = "ALB to tasks"
|
||||
from_port = 0
|
||||
to_port = 65535
|
||||
protocol = "tcp"
|
||||
security_groups = [aws_security_group.alb.id]
|
||||
}
|
||||
|
||||
egress {
|
||||
description = "All egress (LLM providers, RDS, Redis)"
|
||||
from_port = 0
|
||||
to_port = 0
|
||||
protocol = "-1"
|
||||
cidr_blocks = ["0.0.0.0/0"]
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_security_group" "rds" {
|
||||
name = "${local.name}-rds"
|
||||
description = "RDS Postgres - tasks only."
|
||||
vpc_id = aws_vpc.this.id
|
||||
|
||||
ingress {
|
||||
description = "Postgres from ECS tasks"
|
||||
from_port = 5432
|
||||
to_port = 5432
|
||||
protocol = "tcp"
|
||||
security_groups = [aws_security_group.tasks.id]
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_security_group" "redis" {
|
||||
name = "${local.name}-redis"
|
||||
description = "ElastiCache Redis - tasks only."
|
||||
vpc_id = aws_vpc.this.id
|
||||
|
||||
ingress {
|
||||
description = "Redis from ECS tasks"
|
||||
from_port = 6379
|
||||
to_port = 6379
|
||||
protocol = "tcp"
|
||||
security_groups = [aws_security_group.tasks.id]
|
||||
}
|
||||
}
|
||||
72
terraform/litellm/aws/outputs.tf
Normal file
72
terraform/litellm/aws/outputs.tf
Normal file
@ -0,0 +1,72 @@
|
||||
output "alb_dns_name" {
|
||||
description = "Public DNS name of the LiteLLM ALB."
|
||||
value = aws_lb.this.dns_name
|
||||
}
|
||||
|
||||
output "alb_url" {
|
||||
description = "Proxy URL. Switches scheme based on whether acm_certificate_arn is set; the underlying DNS name is the ALB. The dashboard is served at /, the API at /v1/*."
|
||||
value = "${local.tls_enabled ? "https" : "http"}://${aws_lb.this.dns_name}"
|
||||
}
|
||||
|
||||
output "ecs_cluster" {
|
||||
description = "ECS cluster name."
|
||||
value = aws_ecs_cluster.this.name
|
||||
}
|
||||
|
||||
output "aurora_writer_endpoint" {
|
||||
description = "Aurora writer endpoint (cluster endpoint). Used by gateway/backend as DATABASE_HOST."
|
||||
value = aws_rds_cluster.this.endpoint
|
||||
}
|
||||
|
||||
output "aurora_reader_endpoint" {
|
||||
description = "Aurora reader endpoint. Used by gateway/backend as DATABASE_HOST_READ_REPLICA."
|
||||
value = aws_rds_cluster.this.reader_endpoint
|
||||
}
|
||||
|
||||
output "redis_endpoint" {
|
||||
description = "ElastiCache Redis primary endpoint (TLS, transit_encryption_enabled = true)."
|
||||
value = "${aws_elasticache_replication_group.this.primary_endpoint_address}:${aws_elasticache_replication_group.this.port}"
|
||||
}
|
||||
|
||||
output "s3_bucket" {
|
||||
description = "S3 bucket name. Exposed to gateway + backend as S3_BUCKET_NAME / S3_REGION_NAME. Reference from proxy_config via `os.environ/S3_BUCKET_NAME`."
|
||||
value = aws_s3_bucket.this.bucket
|
||||
}
|
||||
|
||||
output "master_key_secret_arn" {
|
||||
description = "Secrets Manager ARN holding LITELLM_MASTER_KEY. Fetch with `aws secretsmanager get-secret-value --secret-id <arn>`."
|
||||
value = aws_secretsmanager_secret.master_key.arn
|
||||
}
|
||||
|
||||
output "db_master_password_secret_arn" {
|
||||
description = "Secrets Manager ARN holding the Aurora master credentials (bootstrap-only). Used to create the IAM-authed application user."
|
||||
value = aws_secretsmanager_secret.db_master_password.arn
|
||||
}
|
||||
|
||||
# Pre-baked SQL to run once as the master user, creating the IAM-authed
|
||||
# application user that gateway/backend/migration tasks will authenticate as.
|
||||
output "db_bootstrap_sql" {
|
||||
description = "Run this once as the master DB user (after the first apply) to create the IAM-authed app user."
|
||||
value = <<-SQL
|
||||
CREATE USER ${var.db_username};
|
||||
GRANT rds_iam TO ${var.db_username};
|
||||
GRANT ALL PRIVILEGES ON DATABASE ${var.db_name} TO ${var.db_username};
|
||||
GRANT ALL ON SCHEMA public TO ${var.db_username};
|
||||
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO ${var.db_username};
|
||||
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO ${var.db_username};
|
||||
SQL
|
||||
}
|
||||
|
||||
# Pre-baked command for running the one-off migration task. ECS run-task
|
||||
# needs the subnet + SG IDs at call time, so we render the full command.
|
||||
output "migration_run_command" {
|
||||
description = "Shell command that runs the one-off prisma migration task against Aurora. Run this once, after the bootstrap SQL above, before sending traffic."
|
||||
value = format(
|
||||
"aws ecs run-task --cluster %s --launch-type FARGATE --task-definition %s --network-configuration 'awsvpcConfiguration={subnets=[%s],securityGroups=[%s],assignPublicIp=DISABLED}' --region %s",
|
||||
aws_ecs_cluster.this.name,
|
||||
aws_ecs_task_definition.migrations.arn,
|
||||
join(",", aws_subnet.private[*].id),
|
||||
aws_security_group.tasks.id,
|
||||
var.region,
|
||||
)
|
||||
}
|
||||
13
terraform/litellm/aws/providers.tf
Normal file
13
terraform/litellm/aws/providers.tf
Normal file
@ -0,0 +1,13 @@
|
||||
provider "aws" {
|
||||
region = var.region
|
||||
|
||||
default_tags {
|
||||
tags = merge(
|
||||
{
|
||||
"litellm:stack" = local.name
|
||||
"managed-by" = "terraform"
|
||||
},
|
||||
var.tags,
|
||||
)
|
||||
}
|
||||
}
|
||||
85
terraform/litellm/aws/rds.tf
Normal file
85
terraform/litellm/aws/rds.tf
Normal file
@ -0,0 +1,85 @@
|
||||
# Aurora Postgres cluster with one writer + one reader instance, IAM
|
||||
# database authentication enabled.
|
||||
#
|
||||
# Important: enabling IAM auth on the cluster does not by itself grant any
|
||||
# Postgres user the ability to log in with an IAM token. After the first
|
||||
# apply, connect as the master user (password lives in Secrets Manager —
|
||||
# see `master_user_secret_arn` in outputs) and run, once:
|
||||
#
|
||||
# CREATE USER {var.db_username};
|
||||
# GRANT rds_iam TO {var.db_username};
|
||||
# GRANT ALL PRIVILEGES ON DATABASE {var.db_name} TO {var.db_username};
|
||||
# GRANT ALL ON SCHEMA public TO {var.db_username};
|
||||
#
|
||||
# After that, the gateway/backend/migration tasks (which authenticate as
|
||||
# `{var.db_username}` via IAM-signed tokens) can connect. The master user
|
||||
# itself is a superuser and Postgres refuses to grant `rds_iam` to
|
||||
# superusers — keep it for break-glass only.
|
||||
|
||||
resource "aws_db_subnet_group" "this" {
|
||||
name = "${local.name}-db"
|
||||
subnet_ids = aws_subnet.private[*].id
|
||||
}
|
||||
|
||||
resource "aws_rds_cluster_parameter_group" "this" {
|
||||
name = "${local.name}-cluster-pg"
|
||||
family = "aurora-postgresql${split(".", var.db_engine_version)[0]}"
|
||||
description = "LiteLLM Aurora Postgres cluster parameters."
|
||||
}
|
||||
|
||||
resource "aws_rds_cluster" "this" {
|
||||
cluster_identifier = local.name
|
||||
engine = "aurora-postgresql"
|
||||
engine_mode = "provisioned"
|
||||
engine_version = var.db_engine_version
|
||||
database_name = var.db_name
|
||||
master_username = var.db_master_username
|
||||
master_password = random_password.db_master_password.result
|
||||
db_subnet_group_name = aws_db_subnet_group.this.name
|
||||
vpc_security_group_ids = [aws_security_group.rds.id]
|
||||
db_cluster_parameter_group_name = aws_rds_cluster_parameter_group.this.name
|
||||
|
||||
iam_database_authentication_enabled = true
|
||||
storage_encrypted = true
|
||||
apply_immediately = true
|
||||
|
||||
# Final-snapshot guard. With the safe default (skip_final_snapshot = false),
|
||||
# `terraform destroy` takes a snapshot named `<cluster>-final-<short-sha>`
|
||||
# before dropping the cluster. The short SHA disambiguates repeated
|
||||
# destroy/recreate cycles so each snapshot has a unique name.
|
||||
skip_final_snapshot = var.skip_final_snapshot
|
||||
final_snapshot_identifier = var.skip_final_snapshot ? null : "${local.name}-final-${substr(md5(local.name), 0, 8)}"
|
||||
|
||||
backup_retention_period = 7
|
||||
preferred_backup_window = "07:00-09:00"
|
||||
}
|
||||
|
||||
resource "aws_rds_cluster_instance" "writer" {
|
||||
identifier = "${local.name}-writer"
|
||||
cluster_identifier = aws_rds_cluster.this.id
|
||||
instance_class = var.db_instance_class
|
||||
engine = aws_rds_cluster.this.engine
|
||||
engine_version = aws_rds_cluster.this.engine_version
|
||||
|
||||
publicly_accessible = false
|
||||
performance_insights_enabled = true
|
||||
|
||||
# Promotion tier 0 — first in line during failover, so this instance stays
|
||||
# the writer unless it goes unhealthy.
|
||||
promotion_tier = 0
|
||||
}
|
||||
|
||||
resource "aws_rds_cluster_instance" "reader" {
|
||||
identifier = "${local.name}-reader"
|
||||
cluster_identifier = aws_rds_cluster.this.id
|
||||
instance_class = var.db_instance_class
|
||||
engine = aws_rds_cluster.this.engine
|
||||
engine_version = aws_rds_cluster.this.engine_version
|
||||
|
||||
publicly_accessible = false
|
||||
performance_insights_enabled = true
|
||||
|
||||
# Higher promotion tier — won't be picked as writer during a failover
|
||||
# unless the writer instance itself is gone.
|
||||
promotion_tier = 15
|
||||
}
|
||||
33
terraform/litellm/aws/redis.tf
Normal file
33
terraform/litellm/aws/redis.tf
Normal file
@ -0,0 +1,33 @@
|
||||
resource "aws_elasticache_subnet_group" "this" {
|
||||
name = "${local.name}-redis"
|
||||
subnet_ids = aws_subnet.private[*].id
|
||||
}
|
||||
|
||||
# Replication group (not aws_elasticache_cluster, which is the
|
||||
# Memcached / single-node Redis resource and can't be upgraded in-place
|
||||
# to HA). With redis_num_replicas >= 1 we get automatic_failover_enabled
|
||||
# + multi_az_enabled; at_rest_encryption_enabled and
|
||||
# transit_encryption_enabled are on unconditionally so Redis traffic is
|
||||
# TLS-protected — the proxy connects via the rediss:// scheme thanks to
|
||||
# REDIS_SSL=true in the shared task env (see ecs.tf).
|
||||
resource "aws_elasticache_replication_group" "this" {
|
||||
replication_group_id = "${local.name}-redis"
|
||||
description = "LiteLLM ElastiCache Redis"
|
||||
|
||||
engine = "redis"
|
||||
engine_version = "7.1"
|
||||
node_type = var.redis_node_type
|
||||
num_cache_clusters = 1 + var.redis_num_replicas
|
||||
parameter_group_name = "default.redis7"
|
||||
port = 6379
|
||||
|
||||
subnet_group_name = aws_elasticache_subnet_group.this.name
|
||||
security_group_ids = [aws_security_group.redis.id]
|
||||
|
||||
automatic_failover_enabled = var.redis_num_replicas >= 1
|
||||
multi_az_enabled = var.redis_num_replicas >= 1
|
||||
at_rest_encryption_enabled = true
|
||||
transit_encryption_enabled = true
|
||||
|
||||
apply_immediately = true
|
||||
}
|
||||
80
terraform/litellm/aws/s3.tf
Normal file
80
terraform/litellm/aws/s3.tf
Normal file
@ -0,0 +1,80 @@
|
||||
# General-purpose S3 bucket for the proxy. LiteLLM uses S3 for:
|
||||
# - Cache backend (cache_params.s3_bucket_name in proxy_config)
|
||||
# - Request log archival (S3_REQUEST_LOGS_BUCKET_NAME)
|
||||
# - /v1/files endpoint passthrough storage
|
||||
#
|
||||
# The bucket name + region are exposed to gateway + backend as S3_BUCKET_NAME
|
||||
# / S3_REGION_NAME so proxy_config can reference them via
|
||||
# `os.environ/S3_BUCKET_NAME`. The task role is scoped to this bucket only.
|
||||
|
||||
resource "random_id" "s3_suffix" {
|
||||
byte_length = 4
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket" "this" {
|
||||
bucket = "${local.name}-${random_id.s3_suffix.hex}"
|
||||
|
||||
# Default false → `terraform destroy` refuses on a non-empty bucket so
|
||||
# cached responses, archived request logs, and /v1/files storage stay put.
|
||||
# Flip to true only for ephemeral / CI stacks (`var.s3_force_destroy`).
|
||||
force_destroy = var.s3_force_destroy
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_versioning" "this" {
|
||||
bucket = aws_s3_bucket.this.id
|
||||
versioning_configuration {
|
||||
status = "Enabled"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
|
||||
bucket = aws_s3_bucket.this.id
|
||||
|
||||
rule {
|
||||
apply_server_side_encryption_by_default {
|
||||
sse_algorithm = "AES256"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_s3_bucket_public_access_block" "this" {
|
||||
bucket = aws_s3_bucket.this.id
|
||||
|
||||
block_public_acls = true
|
||||
block_public_policy = true
|
||||
ignore_public_acls = true
|
||||
restrict_public_buckets = true
|
||||
}
|
||||
|
||||
# Task role gains object-level read/write on this bucket. Bucket-level perms
|
||||
# (list/location) are also scoped to this bucket only.
|
||||
data "aws_iam_policy_document" "s3_access" {
|
||||
statement {
|
||||
actions = [
|
||||
"s3:ListBucket",
|
||||
"s3:GetBucketLocation",
|
||||
]
|
||||
resources = [aws_s3_bucket.this.arn]
|
||||
}
|
||||
|
||||
statement {
|
||||
actions = [
|
||||
"s3:GetObject",
|
||||
"s3:PutObject",
|
||||
"s3:DeleteObject",
|
||||
"s3:AbortMultipartUpload",
|
||||
"s3:ListMultipartUploadParts",
|
||||
]
|
||||
resources = ["${aws_s3_bucket.this.arn}/*"]
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_iam_policy" "s3_access" {
|
||||
name = "${local.name}-s3-access"
|
||||
policy = data.aws_iam_policy_document.s3_access.json
|
||||
}
|
||||
|
||||
resource "aws_iam_role_policy_attachment" "task_s3_access" {
|
||||
role = aws_iam_role.task.name
|
||||
policy_arn = aws_iam_policy.s3_access.arn
|
||||
}
|
||||
86
terraform/litellm/aws/secrets.tf
Normal file
86
terraform/litellm/aws/secrets.tf
Normal file
@ -0,0 +1,86 @@
|
||||
resource "random_password" "master_key" {
|
||||
length = 48
|
||||
special = false
|
||||
min_lower = 4
|
||||
min_upper = 4
|
||||
min_numeric = 4
|
||||
}
|
||||
|
||||
# Master DB password — used once to bootstrap the IAM-authed application
|
||||
# user (see rds.tf header). Runtime services authenticate via IAM tokens
|
||||
# and never read this secret.
|
||||
resource "random_password" "db_master_password" {
|
||||
length = 32
|
||||
special = false
|
||||
min_lower = 4
|
||||
min_upper = 4
|
||||
min_numeric = 4
|
||||
}
|
||||
|
||||
# LITELLM_MASTER_KEY — must begin with `sk-` per the proxy's validator.
|
||||
resource "aws_secretsmanager_secret" "master_key" {
|
||||
name = "${local.name}-master-key"
|
||||
description = "LITELLM_MASTER_KEY for gateway + backend."
|
||||
recovery_window_in_days = 0
|
||||
}
|
||||
|
||||
resource "aws_secretsmanager_secret_version" "master_key" {
|
||||
secret_id = aws_secretsmanager_secret.master_key.id
|
||||
# When the operator passes litellm_master_key, use it verbatim. Otherwise
|
||||
# fall back to the auto-generated `sk-…` value (trial / OSS path).
|
||||
secret_string = coalesce(var.litellm_master_key, "sk-${random_password.master_key.result}")
|
||||
}
|
||||
|
||||
# LITELLM_LICENSE — only created when the operator supplies one. The
|
||||
# task-execution role gets GetSecretValue via iam.tf, and gateway + backend
|
||||
# pick the env var up through shared_secrets in ecs.tf.
|
||||
resource "aws_secretsmanager_secret" "license" {
|
||||
count = var.litellm_license == "" ? 0 : 1
|
||||
|
||||
name = "${local.name}-license"
|
||||
description = "LITELLM_LICENSE for gateway + backend."
|
||||
recovery_window_in_days = 0
|
||||
}
|
||||
|
||||
resource "aws_secretsmanager_secret_version" "license" {
|
||||
count = var.litellm_license == "" ? 0 : 1
|
||||
|
||||
secret_id = aws_secretsmanager_secret.license[0].id
|
||||
secret_string = var.litellm_license
|
||||
}
|
||||
|
||||
# UI_PASSWORD — backend-only. Same pattern as license: only created when
|
||||
# the operator supplies one. The execution role gets GetSecretValue via
|
||||
# iam.tf, and the backend task picks the env var up through
|
||||
# backend_managed_secrets in ecs.tf.
|
||||
resource "aws_secretsmanager_secret" "ui_password" {
|
||||
count = var.ui_password == "" ? 0 : 1
|
||||
|
||||
name = "${local.name}-ui-password"
|
||||
description = "UI_PASSWORD for the backend (UI admin login)."
|
||||
recovery_window_in_days = 0
|
||||
}
|
||||
|
||||
resource "aws_secretsmanager_secret_version" "ui_password" {
|
||||
count = var.ui_password == "" ? 0 : 1
|
||||
|
||||
secret_id = aws_secretsmanager_secret.ui_password[0].id
|
||||
secret_string = var.ui_password
|
||||
}
|
||||
|
||||
resource "aws_secretsmanager_secret" "db_master_password" {
|
||||
name = "${local.name}-db-master-password"
|
||||
description = "Aurora master-user password - bootstrap only. Runtime auth is IAM-token."
|
||||
recovery_window_in_days = 0
|
||||
}
|
||||
|
||||
resource "aws_secretsmanager_secret_version" "db_master_password" {
|
||||
secret_id = aws_secretsmanager_secret.db_master_password.id
|
||||
secret_string = jsonencode({
|
||||
username = var.db_master_username
|
||||
password = random_password.db_master_password.result
|
||||
host = aws_rds_cluster.this.endpoint
|
||||
port = aws_rds_cluster.this.port
|
||||
dbname = var.db_name
|
||||
})
|
||||
}
|
||||
88
terraform/litellm/aws/terraform.tfvars.example
Normal file
88
terraform/litellm/aws/terraform.tfvars.example
Normal file
@ -0,0 +1,88 @@
|
||||
region = "us-west-2"
|
||||
azs = ["us-west-2a", "us-west-2b"]
|
||||
|
||||
# Resource naming: every AWS resource the stack creates is named
|
||||
# `${tenant}-litellm-${env}` (or that plus a per-resource suffix). E.g.
|
||||
# tenant="acme" + env="stage" → ALB `acme-litellm-stage`, ECS service
|
||||
# `acme-litellm-stage-gateway`, etc.
|
||||
tenant = "acme"
|
||||
env = "stage"
|
||||
|
||||
# Tenant-supplied secrets. Prefer TF_VAR_litellm_master_key /
|
||||
# TF_VAR_litellm_license / TF_VAR_ui_password env vars so the values don't
|
||||
# end up in a committed tfvars file. All three are optional — when
|
||||
# omitted the stack auto-generates a master key, runs without a license,
|
||||
# and falls back to LITELLM_MASTER_KEY for UI login.
|
||||
# litellm_master_key = "sk-..."
|
||||
# litellm_license = "lic-..."
|
||||
# ui_password = "..."
|
||||
|
||||
# TLS: provide an ACM cert for production. Without one, plan fails unless
|
||||
# allow_plaintext_alb = true is set explicitly (trial/dev only).
|
||||
# acm_certificate_arn = "arn:aws:acm:us-west-2:111122223333:certificate/..."
|
||||
# allow_plaintext_alb = true
|
||||
|
||||
# Storage retention: false (default) makes `terraform destroy` refuse on a
|
||||
# non-empty bucket. Flip to true only for ephemeral / CI stacks.
|
||||
# s3_force_destroy = false
|
||||
|
||||
# Component images. Defaults pin all four to the same GHCR release tag —
|
||||
# bump them together when bumping LiteLLM. Override here to pull from a
|
||||
# private registry or to mix-and-match versions.
|
||||
# gateway_image = "ghcr.io/berriai/litellm-gateway:1.86.0-dev"
|
||||
# backend_image = "ghcr.io/berriai/litellm-backend:1.86.0-dev"
|
||||
# ui_image = "ghcr.io/berriai/litellm-ui:1.86.0-dev"
|
||||
# migrations_image = "ghcr.io/berriai/litellm-migrations:1.86.0-dev"
|
||||
|
||||
# Per-task sizing for the gateway. Defaults are 1 vCPU / 4 GiB / 1 worker.
|
||||
# uvicorn rule of thumb for CPU-bound work is (2 * vCPU) + 1 workers.
|
||||
# gateway_cpu = 1024 # 1024 = 1 vCPU
|
||||
# gateway_memory = 4096 # MiB
|
||||
# gateway_num_workers = 1
|
||||
|
||||
# ---------- proxy_config (mirrors helm gateway.config.proxy_config) ----------
|
||||
# proxy_config = {
|
||||
# model_list = [
|
||||
# {
|
||||
# model_name = "gpt-4o"
|
||||
# litellm_params = {
|
||||
# model = "openai/gpt-4o"
|
||||
# api_key = "os.environ/OPENAI_API_KEY"
|
||||
# }
|
||||
# },
|
||||
# {
|
||||
# model_name = "claude-sonnet-4-6"
|
||||
# litellm_params = {
|
||||
# model = "anthropic/claude-sonnet-4-6"
|
||||
# api_key = "os.environ/ANTHROPIC_API_KEY"
|
||||
# }
|
||||
# },
|
||||
# ]
|
||||
# general_settings = {
|
||||
# master_key = "os.environ/LITELLM_MASTER_KEY"
|
||||
# database_url = "os.environ/DATABASE_URL"
|
||||
# }
|
||||
# }
|
||||
|
||||
# ---------- Extra env / secrets ----------
|
||||
# Plain-text env vars (non-sensitive). Land directly in the ECS task def.
|
||||
# gateway_extra_env = {
|
||||
# LANGFUSE_HOST = "https://us.cloud.langfuse.com"
|
||||
# }
|
||||
|
||||
# Backend env vars commonly tuned in prod: SSO redirect, docs branding,
|
||||
# UI admin username. UI_PASSWORD is its own first-class var (see top).
|
||||
# backend_extra_env = {
|
||||
# AUTO_REDIRECT_UI_LOGIN_TO_SSO = "true"
|
||||
# DOCS_TITLE = "Acme LiteLLM"
|
||||
# UI_USERNAME = "admin"
|
||||
# }
|
||||
|
||||
# Provider API keys, sourced from existing Secrets Manager entries. The
|
||||
# execution role auto-gains GetSecretValue on each ARN listed here. The
|
||||
# values you reference above as `os.environ/OPENAI_API_KEY` must appear
|
||||
# here. Same shape works for backend_extra_secrets.
|
||||
# gateway_extra_secrets = {
|
||||
# OPENAI_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:openai-api-key-AbCdEf"
|
||||
# ANTHROPIC_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:anthropic-api-key-GhIjKl"
|
||||
# }
|
||||
458
terraform/litellm/aws/variables.tf
Normal file
458
terraform/litellm/aws/variables.tf
Normal file
@ -0,0 +1,458 @@
|
||||
variable "region" {
|
||||
description = "AWS region to deploy into."
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "tenant" {
|
||||
description = "Tenant slug — used as the prefix for every AWS resource the stack creates. Combined with var.env to form `<tenant>-litellm-<env>` (e.g. `acme-litellm-stage`)."
|
||||
type = string
|
||||
|
||||
validation {
|
||||
condition = can(regex("^[a-z][a-z0-9-]{0,20}$", var.tenant))
|
||||
error_message = "tenant must be 1-21 chars, lower-kebab-case, starting with a letter."
|
||||
}
|
||||
}
|
||||
|
||||
variable "env" {
|
||||
description = "Environment suffix appended to every resource name (e.g. `stage`, `prod`, `dev`)."
|
||||
type = string
|
||||
|
||||
validation {
|
||||
condition = can(regex("^[a-z][a-z0-9-]{0,8}$", var.env))
|
||||
error_message = "env must be 1-9 chars, lower-kebab-case, starting with a letter."
|
||||
}
|
||||
}
|
||||
|
||||
variable "tags" {
|
||||
description = "Additional tags merged into the provider default_tags."
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
# ---------- Tenant-supplied secrets ----------
|
||||
#
|
||||
# Both default to "" so the stack stays usable for trial / OSS deploys.
|
||||
# Set via TF_VAR_litellm_master_key / TF_VAR_litellm_license to keep the
|
||||
# values out of state files committed to a VCS.
|
||||
|
||||
variable "litellm_master_key" {
|
||||
description = <<-EOT
|
||||
Pre-existing LITELLM_MASTER_KEY (must begin with `sk-`). When set, this
|
||||
value is written to the master-key Secrets Manager entry. When empty,
|
||||
the stack auto-generates a random `sk-…` key (preserving today's
|
||||
trial-deploy behavior).
|
||||
EOT
|
||||
type = string
|
||||
default = ""
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "litellm_license" {
|
||||
description = <<-EOT
|
||||
LiteLLM enterprise license string. When set, the stack creates a
|
||||
`<tenant>-litellm-<env>-license` Secrets Manager entry, grants the
|
||||
task-execution role GetSecretValue on it, and exposes its value to
|
||||
gateway + backend as `LITELLM_LICENSE`. Leave empty for OSS-only deploys.
|
||||
EOT
|
||||
type = string
|
||||
default = ""
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "ui_password" {
|
||||
description = <<-EOT
|
||||
UI admin password. When set, the stack creates a
|
||||
`<tenant>-litellm-<env>-ui-password` Secrets Manager entry, grants the
|
||||
task-execution role GetSecretValue on it, and exposes its value to the
|
||||
backend as `UI_PASSWORD`. Pair with `backend_extra_env.UI_USERNAME` to
|
||||
set the matching username. Leave empty to skip — the proxy then falls
|
||||
back to the LITELLM_MASTER_KEY for UI login.
|
||||
EOT
|
||||
type = string
|
||||
default = ""
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# ---------- Networking ----------
|
||||
|
||||
variable "vpc_cidr" {
|
||||
description = "CIDR block for the VPC."
|
||||
type = string
|
||||
default = "10.40.0.0/16"
|
||||
}
|
||||
|
||||
variable "azs" {
|
||||
description = "Availability zones to spread subnets across. At least 2 required for RDS and ALB."
|
||||
type = list(string)
|
||||
validation {
|
||||
condition = length(var.azs) >= 2
|
||||
error_message = "Provide at least 2 availability zones."
|
||||
}
|
||||
}
|
||||
|
||||
# ---------- Component images ----------
|
||||
#
|
||||
# Defaults pin the four componentized images at the same release tag on
|
||||
# GHCR. Override on a per-component basis in tfvars when bumping; bump them
|
||||
# together when bumping the LiteLLM release.
|
||||
|
||||
variable "gateway_image" {
|
||||
description = "Container image for the gateway (data plane, port 4000). Tag must match a tag actually published to GHCR — the split images use the `v`-prefixed semver convention."
|
||||
type = string
|
||||
default = "ghcr.io/berriai/litellm-gateway:v1.86.0-dev"
|
||||
}
|
||||
|
||||
variable "backend_image" {
|
||||
description = "Container image for the backend (management API, port 4001)."
|
||||
type = string
|
||||
default = "ghcr.io/berriai/litellm-backend:v1.86.0-dev"
|
||||
}
|
||||
|
||||
variable "ui_image" {
|
||||
description = "Container image for the UI (nginx static export, port 3000)."
|
||||
type = string
|
||||
default = "ghcr.io/berriai/litellm-ui:v1.86.0-dev"
|
||||
}
|
||||
|
||||
variable "migrations_image" {
|
||||
description = <<-EOT
|
||||
Container image for the one-off prisma migration task. Built from
|
||||
`migrations/Dockerfile` — slim image whose ENTRYPOINT runs
|
||||
`python3 /app/run.py` (assembles DATABASE_URL from DATABASE_* env vars
|
||||
via DatabaseURLSettings, then runs `prisma migrate deploy`). Should track
|
||||
the same release tag as gateway/backend/ui.
|
||||
EOT
|
||||
type = string
|
||||
default = "ghcr.io/berriai/litellm-migrations:v1.86.0-dev"
|
||||
}
|
||||
|
||||
# ---------- Service sizing ----------
|
||||
|
||||
variable "gateway_cpu" {
|
||||
description = "Fargate CPU units for the gateway task (1024 = 1 vCPU)."
|
||||
type = number
|
||||
default = 1024
|
||||
}
|
||||
|
||||
variable "gateway_memory" {
|
||||
description = "Fargate memory (MiB) for the gateway task."
|
||||
type = number
|
||||
default = 4096
|
||||
}
|
||||
|
||||
variable "gateway_desired_count" {
|
||||
description = "Desired number of gateway tasks."
|
||||
type = number
|
||||
default = 2
|
||||
}
|
||||
|
||||
variable "gateway_num_workers" {
|
||||
description = "uvicorn worker processes per gateway task (passed as --workers). Size relative to gateway_cpu — uvicorn recommends ~(2 × vCPU) + 1 for CPU-bound work."
|
||||
type = number
|
||||
default = 1
|
||||
|
||||
validation {
|
||||
condition = var.gateway_num_workers >= 1
|
||||
error_message = "gateway_num_workers must be >= 1."
|
||||
}
|
||||
}
|
||||
|
||||
variable "backend_cpu" {
|
||||
description = "Fargate CPU units for the backend task (1024 = 1 vCPU)."
|
||||
type = number
|
||||
default = 1024
|
||||
}
|
||||
|
||||
variable "backend_memory" {
|
||||
description = "Fargate memory (MiB) for the backend task. The proxy_server import chain alone needs >1 GiB; 4 GiB matches gateway."
|
||||
type = number
|
||||
default = 4096
|
||||
}
|
||||
|
||||
variable "backend_desired_count" {
|
||||
description = "Desired number of backend tasks."
|
||||
type = number
|
||||
default = 1
|
||||
}
|
||||
|
||||
variable "ui_cpu" {
|
||||
description = "Fargate CPU units for the UI task."
|
||||
type = number
|
||||
default = 256
|
||||
}
|
||||
|
||||
variable "ui_memory" {
|
||||
description = "Fargate memory (MiB) for the UI task."
|
||||
type = number
|
||||
default = 512
|
||||
}
|
||||
|
||||
variable "ui_desired_count" {
|
||||
description = "Desired number of UI tasks."
|
||||
type = number
|
||||
default = 1
|
||||
}
|
||||
|
||||
# ---------- Autoscaling ----------
|
||||
# Defaults mirror helm/litellm/values.yaml HPAs. The "*_desired_count" vars
|
||||
# above seed the initial task count; once autoscaling is enabled, the service's
|
||||
# desired_count is left to Application Auto Scaling (ecs.tf ignores future
|
||||
# changes to it).
|
||||
|
||||
variable "gateway_autoscaling_enabled" {
|
||||
description = "Toggle Application Auto Scaling target-tracking on the gateway service."
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "gateway_min_capacity" {
|
||||
description = "Minimum gateway task count under autoscaling."
|
||||
type = number
|
||||
default = 1
|
||||
}
|
||||
|
||||
variable "gateway_max_capacity" {
|
||||
description = "Maximum gateway task count under autoscaling."
|
||||
type = number
|
||||
default = 10
|
||||
}
|
||||
|
||||
variable "gateway_cpu_target" {
|
||||
description = "Target average CPU utilization (%) for the gateway autoscaling policy."
|
||||
type = number
|
||||
default = 70
|
||||
}
|
||||
|
||||
variable "gateway_memory_target" {
|
||||
description = "Target average memory utilization (%) for the gateway autoscaling policy. Set 0 to skip the memory policy and scale on CPU only."
|
||||
type = number
|
||||
default = 80
|
||||
}
|
||||
|
||||
variable "backend_autoscaling_enabled" {
|
||||
description = "Toggle Application Auto Scaling target-tracking on the backend service."
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "backend_min_capacity" {
|
||||
description = "Minimum backend task count under autoscaling."
|
||||
type = number
|
||||
default = 1
|
||||
}
|
||||
|
||||
variable "backend_max_capacity" {
|
||||
description = "Maximum backend task count under autoscaling."
|
||||
type = number
|
||||
default = 4
|
||||
}
|
||||
|
||||
variable "backend_cpu_target" {
|
||||
description = "Target average CPU utilization (%) for the backend autoscaling policy."
|
||||
type = number
|
||||
default = 70
|
||||
}
|
||||
|
||||
variable "ui_autoscaling_enabled" {
|
||||
description = "Toggle Application Auto Scaling target-tracking on the UI service. Off by default — UI is a static nginx export and one task is usually enough."
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "ui_min_capacity" {
|
||||
description = "Minimum UI task count under autoscaling."
|
||||
type = number
|
||||
default = 1
|
||||
}
|
||||
|
||||
variable "ui_max_capacity" {
|
||||
description = "Maximum UI task count under autoscaling."
|
||||
type = number
|
||||
default = 3
|
||||
}
|
||||
|
||||
variable "ui_cpu_target" {
|
||||
description = "Target average CPU utilization (%) for the UI autoscaling policy."
|
||||
type = number
|
||||
default = 80
|
||||
}
|
||||
|
||||
# ---------- RDS ----------
|
||||
|
||||
variable "db_instance_class" {
|
||||
description = "Aurora instance class for both writer and reader."
|
||||
type = string
|
||||
default = "db.r6g.large"
|
||||
}
|
||||
|
||||
variable "db_engine_version" {
|
||||
description = "Aurora Postgres engine version. Major version drives the parameter-group family (aurora-postgresql<major>)."
|
||||
type = string
|
||||
default = "16.4"
|
||||
}
|
||||
|
||||
variable "db_name" {
|
||||
description = "Initial database name created on the Aurora cluster."
|
||||
type = string
|
||||
default = "litellm"
|
||||
}
|
||||
|
||||
variable "db_master_username" {
|
||||
description = "Aurora master (superuser) username — used only to bootstrap the IAM-authed application user."
|
||||
type = string
|
||||
default = "postgres"
|
||||
}
|
||||
|
||||
variable "db_username" {
|
||||
description = "IAM-authed Postgres user the proxy connects as. Must be CREATEd in the cluster and granted the rds_iam role — see terraform/litellm/aws/README.md."
|
||||
type = string
|
||||
default = "litellm_app"
|
||||
}
|
||||
|
||||
# ---------- Redis ----------
|
||||
|
||||
variable "redis_node_type" {
|
||||
description = "ElastiCache node type."
|
||||
type = string
|
||||
default = "cache.t4g.small"
|
||||
}
|
||||
|
||||
variable "redis_num_replicas" {
|
||||
description = "Number of read replicas in the Redis replication group. The primary plus this many replicas form the cluster — set to 0 for a single-node dev deployment, 1+ for HA. multi_az_enabled and automatic_failover_enabled require >= 1."
|
||||
type = number
|
||||
default = 1
|
||||
|
||||
validation {
|
||||
condition = var.redis_num_replicas >= 0
|
||||
error_message = "redis_num_replicas must be >= 0."
|
||||
}
|
||||
}
|
||||
|
||||
# ---------- TLS ----------
|
||||
|
||||
variable "acm_certificate_arn" {
|
||||
description = <<-EOT
|
||||
ACM certificate ARN for the ALB's HTTPS listener. When set, the stack
|
||||
provisions a 443 listener carrying the same path-routing rules as the 80
|
||||
listener, and the 80 listener is rewritten to redirect HTTP→HTTPS. Leave
|
||||
empty ("") to disable TLS (must combine with `allow_plaintext_alb = true`
|
||||
for the plan to succeed — see README.md "TLS").
|
||||
EOT
|
||||
type = string
|
||||
default = ""
|
||||
}
|
||||
|
||||
variable "allow_plaintext_alb" {
|
||||
description = <<-EOT
|
||||
Opt into HTTP-only mode on the ALB (port 80, no TLS). Default false:
|
||||
`terraform plan` fails when `acm_certificate_arn = ""` so the operator
|
||||
must either provide an ACM cert or consciously opt out. Intended for
|
||||
short-lived trial / dev stacks only.
|
||||
EOT
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
# ---------- RDS ----------
|
||||
|
||||
variable "skip_final_snapshot" {
|
||||
description = "Skip the Aurora final snapshot on `terraform destroy`. Default false — destroying the cluster takes a snapshot first so data is recoverable. Set true only for ephemeral / CI environments where you accept permanent data loss on destroy."
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "s3_force_destroy" {
|
||||
description = <<-EOT
|
||||
Allow `terraform destroy` to delete the S3 bucket even when it still
|
||||
contains objects (request log archives, /v1/files storage, S3 cache
|
||||
backend). Default false — destroying a non-empty bucket fails, acting
|
||||
as a tripwire against accidental data loss. Set true only for
|
||||
ephemeral / CI environments. Mirrors the safety posture of
|
||||
`skip_final_snapshot` on Aurora.
|
||||
EOT
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
# ---------- Extra env ----------
|
||||
|
||||
variable "gateway_extra_env" {
|
||||
description = <<-EOT
|
||||
Additional plain-text env vars for the gateway container. Use this for
|
||||
non-sensitive config (LANGFUSE_HOST, custom feature flags, …). For API
|
||||
keys, use gateway_extra_secrets instead.
|
||||
EOT
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "backend_extra_env" {
|
||||
description = "Additional plain-text env vars for the backend container."
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "gateway_extra_secrets" {
|
||||
description = <<-EOT
|
||||
Extra env vars sourced from AWS Secrets Manager. Map of env-var name to
|
||||
Secrets Manager ARN. Pass the bare secret ARN to inject the whole secret
|
||||
string as the env var value, or append ":<jsonKey>::" to extract a single
|
||||
JSON field (ECS docs).
|
||||
|
||||
Example for OPENAI_API_KEY:
|
||||
gateway_extra_secrets = {
|
||||
OPENAI_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:openai-api-key-AbCdEf"
|
||||
}
|
||||
|
||||
The stack's task execution role automatically gains GetSecretValue on every
|
||||
ARN referenced here (suffix-stripped).
|
||||
EOT
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "backend_extra_secrets" {
|
||||
description = "Same shape as gateway_extra_secrets, but layered onto the backend container."
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "proxy_config" {
|
||||
description = <<-EOT
|
||||
LiteLLM proxy config (the contents of config.yaml). Mirrors the helm
|
||||
chart's `gateway.config.proxy_config` value. Passed to gateway, backend,
|
||||
and the migration task as a base64-encoded env var and decoded to
|
||||
/tmp/litellm-config.yaml at container start; CONFIG_FILE_PATH is set
|
||||
automatically.
|
||||
|
||||
Example:
|
||||
proxy_config = {
|
||||
model_list = [
|
||||
{
|
||||
model_name = "gpt-4o"
|
||||
litellm_params = {
|
||||
model = "openai/gpt-4o"
|
||||
api_key = "os.environ/OPENAI_API_KEY"
|
||||
}
|
||||
},
|
||||
]
|
||||
general_settings = {
|
||||
master_key = "os.environ/LITELLM_MASTER_KEY"
|
||||
database_url = "os.environ/DATABASE_URL"
|
||||
ui_username = "admin"
|
||||
}
|
||||
}
|
||||
|
||||
Leave empty ({}) to skip mounting a config — the proxy then runs with
|
||||
defaults. Use the "os.environ/<NAME>" syntax in the YAML to reference
|
||||
env vars provided by *_extra_env or *_extra_secrets.
|
||||
EOT
|
||||
type = any
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "log_retention_days" {
|
||||
description = "CloudWatch log retention for the three services."
|
||||
type = number
|
||||
default = 30
|
||||
}
|
||||
14
terraform/litellm/aws/versions.tf
Normal file
14
terraform/litellm/aws/versions.tf
Normal file
@ -0,0 +1,14 @@
|
||||
terraform {
|
||||
required_version = ">= 1.6.0"
|
||||
|
||||
required_providers {
|
||||
aws = {
|
||||
source = "hashicorp/aws"
|
||||
version = "~> 5.60"
|
||||
}
|
||||
random = {
|
||||
source = "hashicorp/random"
|
||||
version = "~> 3.6"
|
||||
}
|
||||
}
|
||||
}
|
||||
62
terraform/litellm/gcp/.terraform.lock.hcl
generated
Normal file
62
terraform/litellm/gcp/.terraform.lock.hcl
generated
Normal file
@ -0,0 +1,62 @@
|
||||
# This file is maintained automatically by "terraform init".
|
||||
# Manual edits may be lost in future updates.
|
||||
|
||||
provider "registry.terraform.io/hashicorp/google" {
|
||||
version = "6.50.0"
|
||||
constraints = "~> 6.10"
|
||||
hashes = [
|
||||
"h1:79CwMTsp3Ud1nOl5hFS5mxQHyT0fGVye7pqpU0PPlHI=",
|
||||
"zh:1f3513fcfcbf7ca53d667a168c5067a4dd91a4d4cccd19743e248ff31065503c",
|
||||
"zh:3da7db8fc2c51a77dd958ea8baaa05c29cd7f829bd8941c26e2ea9cb3aadc1e5",
|
||||
"zh:3e09ac3f6ca8111cbb659d38c251771829f4347ab159a12db195e211c76068bb",
|
||||
"zh:7bb9e41c568df15ccf1a8946037355eefb4dfb4e35e3b190808bb7c4abae547d",
|
||||
"zh:81e5d78bdec7778e6d67b5c3544777505db40a826b6eb5abe9b86d4ba396866b",
|
||||
"zh:8d309d020fb321525883f5c4ea864df3d5942b6087f6656d6d8b3a1377f340fc",
|
||||
"zh:93e112559655ab95a523193158f4a4ac0f2bfed7eeaa712010b85ebb551d5071",
|
||||
"zh:d3efe589ffd625b300cef5917c4629513f77e3a7b111c9df65075f76a46a63c7",
|
||||
"zh:d4a4d672bbef756a870d8f32b35925f8ce2ef4f6bbd5b71a3cb764f1b6c85421",
|
||||
"zh:e13a86bca299ba8a118e80d5f84fbdd708fe600ecdceea1a13d4919c068379fe",
|
||||
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
|
||||
"zh:fec30c095647b583a246c39d557704947195a1b7d41f81e369ba377d997faef6",
|
||||
]
|
||||
}
|
||||
|
||||
provider "registry.terraform.io/hashicorp/google-beta" {
|
||||
version = "6.50.0"
|
||||
constraints = "~> 6.10"
|
||||
hashes = [
|
||||
"h1:P2GiUJM1frlPtBViwKn1A9V2dVBdGuWcX80w9TdH8ZE=",
|
||||
"zh:18b442bd0a05321d39dda1e9e3f1bdede4e61bc2ac62cc7a67037a3864f75101",
|
||||
"zh:2e387c51455862828bec923a3ec81abf63a4d998da470cf00e09003bda53d668",
|
||||
"zh:3942e708fa84ebe54996086f4b1398cb747fe19cbcd0be07ace528291fb35dee",
|
||||
"zh:496287dd48b34ae6197cb1f887abeafd07c33f389dbe431bb01e24846754cfdd",
|
||||
"zh:6eca885419969ce5c2a706f34dce1f10bde9774757675f2d8a92d12e5a1be390",
|
||||
"zh:710dbef826c3fe7f76f844dae47937e8e4c1279dd9205ec4610be04cf3327244",
|
||||
"zh:777ebf44b24bfc7bdbf770dc089f1a72f143b4718fdedb8c6bd75983115a1ec2",
|
||||
"zh:9c8703bba37b8c7ad857efc3513392c5a096c519397c1cb822d7612f38e4262f",
|
||||
"zh:c4f1d3a73de2702277c99d5348ad6d374705bcfdd367ad964ff4cfd2cf06c281",
|
||||
"zh:eca8df11af3f5a948492d5b8b5d01b4ec705aad10bc30ec1524205508ae28393",
|
||||
"zh:f41e7fd5f2628e8fd6b8ea136366923858f54428d1729898925469b862c275c2",
|
||||
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
|
||||
]
|
||||
}
|
||||
|
||||
provider "registry.terraform.io/hashicorp/random" {
|
||||
version = "3.8.1"
|
||||
constraints = "~> 3.6"
|
||||
hashes = [
|
||||
"h1:u8AKlWVDTH5r9YLSeswoVEjiY72Rt4/ch7U+61ZDkiQ=",
|
||||
"zh:08dd03b918c7b55713026037c5400c48af5b9f468f483463321bd18e17b907b4",
|
||||
"zh:0eee654a5542dc1d41920bbf2419032d6f0d5625b03bd81339e5b33394a3e0ae",
|
||||
"zh:229665ddf060aa0ed315597908483eee5b818a17d09b6417a0f52fd9405c4f57",
|
||||
"zh:2469d2e48f28076254a2a3fc327f184914566d9e40c5780b8d96ebf7205f8bc0",
|
||||
"zh:37d7eb334d9561f335e748280f5535a384a88675af9a9eac439d4cfd663bcb66",
|
||||
"zh:741101426a2f2c52dee37122f0f4a2f2d6af6d852cb1db634480a86398fa3511",
|
||||
"zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
|
||||
"zh:a902473f08ef8df62cfe6116bd6c157070a93f66622384300de235a533e9d4a9",
|
||||
"zh:b85c511a23e57a2147355932b3b6dce2a11e856b941165793a0c3d7578d94d05",
|
||||
"zh:c5172226d18eaac95b1daac80172287b69d4ce32750c82ad77fa0768be4ea4b8",
|
||||
"zh:dab4434dba34aad569b0bc243c2d3f3ff86dd7740def373f2a49816bd2ff819b",
|
||||
"zh:f49fd62aa8c5525a5c17abd51e27ca5e213881d58882fd42fec4a545b53c9699",
|
||||
]
|
||||
}
|
||||
296
terraform/litellm/gcp/README.md
Normal file
296
terraform/litellm/gcp/README.md
Normal file
@ -0,0 +1,296 @@
|
||||
# LiteLLM on GCP (Cloud Run)
|
||||
|
||||
Deploys the componentized LiteLLM proxy on GCP:
|
||||
|
||||
- **VPC** + Private Services Access range + a Serverless VPC Access connector
|
||||
so Cloud Run can reach private IPs
|
||||
- **Cloud SQL for PostgreSQL** — primary instance + cross-zone read replica,
|
||||
password auth via Secret Manager
|
||||
- **Memorystore (Redis)** for caching + rate limiting, private IP only
|
||||
- **GCS bucket** — private, versioned, uniform IAM; exposed as `GCS_BUCKET_NAME`
|
||||
- **Secret Manager** entries for `LITELLM_MASTER_KEY` and `DATABASE_PASSWORD`
|
||||
- **Cloud Run v2** services for `gateway` (port 4000), `backend` (port 4001),
|
||||
and `ui` (port 3000), all using a shared runtime service account
|
||||
- **Cloud Run Job** (`litellm-migrations`) that runs `prisma migrate deploy` from the dedicated `ghcr.io/berriai/litellm-migrations` image
|
||||
- **External global HTTP(S) load balancer** with serverless NEGs and a URL
|
||||
map mirroring the helm-chart ingress path routing:
|
||||
- LLM data-plane prefixes → `gateway`
|
||||
- UI asset paths → `ui`
|
||||
- Everything else → `backend`
|
||||
|
||||
## Image pulls
|
||||
|
||||
There are four images: `litellm-gateway`, `litellm-backend`, `litellm-ui`,
|
||||
and `litellm-migrations` (slim image used only by the one-off Cloud Run
|
||||
Job — runs `prisma migrate deploy` against the writer DB and exits).
|
||||
Bump them together when bumping LiteLLM.
|
||||
|
||||
Cloud Run only accepts images from Artifact Registry, `[region.]gcr.io`,
|
||||
or `docker.io` — `ghcr.io` URIs are rejected at apply time. The four
|
||||
images are published to GHCR upstream, so any real deploy needs an
|
||||
Artifact Registry remote repository pointed at GHCR.
|
||||
|
||||
**One-time setup (per project):** create a remote repo and let Cloud Run
|
||||
pull through it.
|
||||
|
||||
```bash
|
||||
gcloud artifacts repositories create litellm \
|
||||
--repository-format=docker \
|
||||
--location=us-central1 \
|
||||
--mode=remote-repository \
|
||||
--remote-repo-config-desc="GitHub Container Registry passthrough" \
|
||||
--remote-docker-repo=https://ghcr.io
|
||||
```
|
||||
|
||||
Then point the stack at it via `image_registry`:
|
||||
|
||||
```hcl
|
||||
image_registry = "us-central1-docker.pkg.dev/my-gcp-project/litellm/berriai"
|
||||
image_tag = "v1.86.0-dev"
|
||||
```
|
||||
|
||||
The four `litellm-<component>:${image_tag}` URIs are composed from those
|
||||
two vars. Set `gateway_image` / `backend_image` / `ui_image` /
|
||||
`migrations_image` only if you need a per-component override (custom
|
||||
build, different tag).
|
||||
|
||||
Two further notes:
|
||||
|
||||
- The runtime SAs the stack creates do **not** need
|
||||
`roles/artifactregistry.reader` — Cloud Run pulls images using the
|
||||
per-project serverless agent
|
||||
(`service-<project-num>@serverless-robot-prod.iam.gserviceaccount.com`),
|
||||
not the runtime SA.
|
||||
- For a fully air-gapped option, mirror the images into a regular AR
|
||||
repository instead of a remote repo:
|
||||
|
||||
```bash
|
||||
for c in gateway backend ui migrations; do
|
||||
docker pull ghcr.io/berriai/litellm-$c:<tag>
|
||||
docker tag ghcr.io/berriai/litellm-$c:<tag> \
|
||||
us-central1-docker.pkg.dev/$PROJECT/litellm/$c:<tag>
|
||||
docker push us-central1-docker.pkg.dev/$PROJECT/litellm/$c:<tag>
|
||||
done
|
||||
```
|
||||
|
||||
then set `image_registry = "us-central1-docker.pkg.dev/$PROJECT/litellm"`
|
||||
(drop the `/berriai` suffix — the mirrored layout has no org segment).
|
||||
|
||||
## Database authentication
|
||||
|
||||
LiteLLM's `init_iam_db_url_from_env()` mints **AWS RDS** tokens via boto3 —
|
||||
it doesn't speak GCP IAM. To IAM-auth against Cloud SQL from Cloud Run you'd
|
||||
need the Cloud SQL Auth Proxy as a sidecar, which complicates the service
|
||||
spec. This stack therefore uses **password authentication**:
|
||||
|
||||
- A random password is generated and stored in Secret Manager
|
||||
(`<name>-db-password`).
|
||||
- Each Cloud Run service receives the password as `DATABASE_PASSWORD` via
|
||||
`value_source.secret_key_ref`.
|
||||
- The container's entrypoint shim assembles `DATABASE_URL` (and
|
||||
`DATABASE_URL_READ_REPLICA`) from `DATABASE_HOST` / `DATABASE_PASSWORD`
|
||||
before exec'ing uvicorn — so the password never appears in the service
|
||||
spec or in logs.
|
||||
|
||||
If you need GCP-native IAM auth later, add `cloud-sql-proxy` as a sidecar
|
||||
container under `template.template.containers` (Cloud Run v2 supports
|
||||
multiple containers) and replace the password-based URL with the proxy's
|
||||
Unix socket.
|
||||
|
||||
## Configuring the proxy
|
||||
|
||||
### `proxy_config`
|
||||
|
||||
Mirrors the helm chart's `gateway.config.proxy_config`. The map is
|
||||
YAML-encoded and base64-passed to gateway, backend, and the migration job;
|
||||
each container decodes it to `/tmp/litellm-config.yaml` at startup and sets
|
||||
`CONFIG_FILE_PATH`.
|
||||
|
||||
```hcl
|
||||
proxy_config = {
|
||||
model_list = [
|
||||
{
|
||||
model_name = "gpt-4o"
|
||||
litellm_params = {
|
||||
model = "openai/gpt-4o"
|
||||
api_key = "os.environ/OPENAI_API_KEY"
|
||||
}
|
||||
},
|
||||
]
|
||||
general_settings = {
|
||||
master_key = "os.environ/LITELLM_MASTER_KEY"
|
||||
database_url = "os.environ/DATABASE_URL"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
LiteLLM resolves `os.environ/<NAME>` references against the container
|
||||
environment. Provider API keys belong in `*_extra_secrets` and are
|
||||
referenced from the YAML by env-var name.
|
||||
|
||||
### Extra env / secrets
|
||||
|
||||
Non-sensitive env vars:
|
||||
|
||||
```hcl
|
||||
gateway_extra_env = {
|
||||
LANGFUSE_HOST = "https://us.cloud.langfuse.com"
|
||||
}
|
||||
```
|
||||
|
||||
Sensitive values — create the secret in Secret Manager first, then reference
|
||||
its resource ID:
|
||||
|
||||
```bash
|
||||
echo -n "sk-proj-..." | gcloud secrets create openai-api-key --data-file=-
|
||||
```
|
||||
|
||||
```hcl
|
||||
gateway_extra_secrets = {
|
||||
OPENAI_API_KEY = "projects/my-gcp-project/secrets/openai-api-key"
|
||||
}
|
||||
```
|
||||
|
||||
The Cloud Run runtime SA auto-gains `roles/secretmanager.secretAccessor` on
|
||||
every secret referenced. **Pass the bare secret resource ID only** —
|
||||
`projects/.../secrets/openai-api-key`, never the version-suffixed form
|
||||
`projects/.../secrets/openai-api-key/versions/3`. The Cloud Run
|
||||
`secret_key_ref` binding and the stack's IAM `secret_id` grant both
|
||||
reject the version suffix; version is always resolved as `latest`. If
|
||||
you need a pinned version, edit `local.gateway_extra_secret_kv` in
|
||||
`cloudrun.tf` directly to set `version = "3"` for the entry in question.
|
||||
|
||||
## Tenant deployment
|
||||
|
||||
Every resource the stack creates is named `${tenant}-litellm-${env}` (or
|
||||
that plus a per-resource suffix), so multiple tenants and multiple
|
||||
environments coexist in the same project as long as the `(tenant, env)`
|
||||
pair differs:
|
||||
|
||||
| `tenant` | `env` | Example resource name |
|
||||
| -------- | ------- | ---------------------------------- |
|
||||
| `acme` | `stage` | `acme-litellm-stage-gateway` |
|
||||
| `acme` | `prod` | `acme-litellm-prod-master-key` |
|
||||
| `globex` | `dev` | `globex-litellm-dev-license` |
|
||||
|
||||
For a per-tenant instance, the only inputs that change are the tenant
|
||||
slug, env, and the two pre-issued secrets:
|
||||
|
||||
```bash
|
||||
export TF_VAR_litellm_master_key="sk-..." # the tenant's master key
|
||||
export TF_VAR_litellm_license="lic-..." # their LITELLM_LICENSE
|
||||
|
||||
terraform apply \
|
||||
-var "project=my-gcp-project" \
|
||||
-var "region=us-central1" \
|
||||
-var "tenant=acme" \
|
||||
-var "env=stage"
|
||||
```
|
||||
|
||||
Both `litellm_master_key` and `litellm_license` are optional:
|
||||
- Omit `litellm_master_key` → the stack auto-generates a random `sk-…`
|
||||
value (trial/dev path).
|
||||
- Omit `litellm_license` → no license secret is created and gateway/
|
||||
backend run without `LITELLM_LICENSE` (OSS-only).
|
||||
|
||||
Use `TF_VAR_*` env vars rather than tfvars files for these — values
|
||||
written to a tfvars file end up in `terraform.tfstate` and any committed
|
||||
example files.
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
cd terraform/litellm/gcp
|
||||
cp terraform.tfvars.example terraform.tfvars
|
||||
# Edit: project, region, tenant, env, *_image, proxy_config, gateway_extra_secrets.
|
||||
|
||||
terraform init
|
||||
terraform apply
|
||||
```
|
||||
|
||||
That single apply provisions everything, runs the prisma schema migration via
|
||||
the Cloud Run job (auto-triggered by `bootstrap.tf`), and only then starts the
|
||||
gateway/backend services. When it returns, the stack is serving traffic.
|
||||
|
||||
```bash
|
||||
terraform output lb_url
|
||||
# UI login: admin / <master key>
|
||||
gcloud secrets versions access latest --secret="$(terraform output -raw master_key_secret_id)"
|
||||
```
|
||||
|
||||
The `migration_run_command` output is preserved for break-glass manual re-runs.
|
||||
|
||||
**Prerequisite**: `gcloud` must be authenticated (`gcloud auth login`) and the
|
||||
required APIs must be enabled (run, sqladmin, redis, secretmanager,
|
||||
vpcaccess, compute, servicenetworking, storage, artifactregistry).
|
||||
|
||||
## TLS
|
||||
|
||||
`terraform plan` refuses to provision an HTTP-only LB by default — TLS
|
||||
is the supported posture. Two paths:
|
||||
|
||||
**Production / staging — set `lb_domains`:**
|
||||
|
||||
1. `terraform apply` once with `allow_plaintext_lb = true` (intentional
|
||||
chicken-and-egg escape hatch) to provision the LB and read the anycast
|
||||
IP from `terraform output -raw lb_ip`.
|
||||
2. Point each DNS name you want to serve from at that IP.
|
||||
3. Set `lb_domains = ["proxy.example.com"]` and remove
|
||||
`allow_plaintext_lb`; re-apply.
|
||||
|
||||
Result: a 443 forwarding rule with a Google-managed cert covering each
|
||||
listed domain; the 80 forwarding rule is rewritten to serve a permanent
|
||||
301 redirect to HTTPS, so HTTP clients are automatically upgraded. The
|
||||
managed cert sits in `PROVISIONING` for ~15-60 min on first apply until
|
||||
DNS propagation completes — `gcloud compute ssl-certificates describe
|
||||
<tenant>-litellm-<env>-cert` shows the state.
|
||||
|
||||
**Trial / dev — explicitly opt into HTTP-only:**
|
||||
|
||||
Set `allow_plaintext_lb = true` and leave `lb_domains = []`. Without the
|
||||
flag, plan fails with a clear error pointing at the precondition.
|
||||
Intended for short-lived trial / dev stacks only.
|
||||
|
||||
## Storage and database retention
|
||||
|
||||
Two opt-in tripwires guard against accidental data loss on
|
||||
`terraform destroy`:
|
||||
|
||||
- **`cloudsql_deletion_protection`** (Cloud SQL writer + reader;
|
||||
default `true`) — destroy fails with a clear error rather than
|
||||
dropping the database.
|
||||
- **`gcs_force_destroy`** (GCS bucket holding request log archives,
|
||||
`/v1/files` content, and the GCS cache backend; default `false`) —
|
||||
`terraform destroy` against a non-empty bucket fails.
|
||||
|
||||
Flip `cloudsql_deletion_protection` to `false` or `gcs_force_destroy` to
|
||||
`true` only for ephemeral / CI stacks where you accept losing the data.
|
||||
|
||||
## Redis encryption
|
||||
|
||||
Memorystore runs with `transit_encryption_mode = "SERVER_AUTHENTICATION"`,
|
||||
so the proxy connects via `rediss://`. The instance's self-signed CA cert
|
||||
(`server_ca_certs[0].cert`) is shipped to gateway + backend as
|
||||
`REDIS_CA_PEM_B64`; their entrypoint shell decodes it to `/tmp/redis-ca.pem`
|
||||
before uvicorn starts and points `REDIS_SSL_CA_CERTS` at that path. No
|
||||
extra config needed — but if you ever swap Memorystore for an external
|
||||
Redis, override `REDIS_HOST`/`REDIS_PORT` and either drop these env vars
|
||||
or point them at your own CA.
|
||||
|
||||
## Files
|
||||
|
||||
| File | What's in it |
|
||||
| ----------------- | -------------------------------------------------------------------- |
|
||||
| `versions.tf` | Terraform + provider version constraints |
|
||||
| `providers.tf` | Google + Google-Beta providers |
|
||||
| `variables.tf` | All input variables |
|
||||
| `locals.tf` | Path-prefix lists (mirror of `helm/.../ingress.yaml`) + proxy_config helpers |
|
||||
| `network.tf` | VPC, subnet, PSA range, Serverless VPC connector |
|
||||
| `secrets.tf` | Secret Manager entries + random master_key |
|
||||
| `cloudsql.tf` | Cloud SQL writer + read replica + app user + password secret |
|
||||
| `redis.tf` | Memorystore Redis (private IP) |
|
||||
| `gcs.tf` | GCS bucket + objectAdmin binding |
|
||||
| `iam.tf` | Runtime SA + Cloud SQL client + Secret Manager accessor |
|
||||
| `cloudrun.tf` | 3 Cloud Run services + Cloud Run Job for migrations |
|
||||
| `load_balancer.tf`| External HTTPS LB, serverless NEGs, URL map for path routing |
|
||||
| `outputs.tf` | LB IP, service URLs, secret IDs, migration `execute` command |
|
||||
43
terraform/litellm/gcp/bootstrap.tf
Normal file
43
terraform/litellm/gcp/bootstrap.tf
Normal file
@ -0,0 +1,43 @@
|
||||
# Auto-runs the prisma schema migration as part of `terraform apply`. Mirrors
|
||||
# the AWS stack's terraform_data.migration in spirit. Cloud SQL doesn't need a
|
||||
# separate user-bootstrap step because google_sql_user.app already creates the
|
||||
# application user — so the only post-cluster work is the migration.
|
||||
#
|
||||
# Gateway/backend Cloud Run services depend on this resource (in cloudrun.tf)
|
||||
# so they don't go live until the schema is in place.
|
||||
#
|
||||
# Triggers:
|
||||
# - re-runs if the migrations image changes (new release ships new prisma
|
||||
# migration files).
|
||||
# - re-runs if the migration job is recreated.
|
||||
#
|
||||
# Requires `gcloud` on the machine running terraform, with user creds live
|
||||
# enough to invoke Cloud Run admin APIs (`gcloud auth login`).
|
||||
|
||||
resource "terraform_data" "migration" {
|
||||
triggers_replace = {
|
||||
job_id = google_cloud_run_v2_job.migrations.id
|
||||
job_image = local.migrations_image
|
||||
}
|
||||
|
||||
provisioner "local-exec" {
|
||||
interpreter = ["bash", "-c"]
|
||||
environment = {
|
||||
JOB = google_cloud_run_v2_job.migrations.name
|
||||
REGION = var.region
|
||||
PROJECT = var.project
|
||||
}
|
||||
command = <<-EOT
|
||||
set -euo pipefail
|
||||
gcloud run jobs execute "$JOB" \
|
||||
--region "$REGION" \
|
||||
--project "$PROJECT" \
|
||||
--wait
|
||||
EOT
|
||||
}
|
||||
|
||||
depends_on = [
|
||||
google_cloud_run_v2_job.migrations,
|
||||
google_sql_user.app,
|
||||
]
|
||||
}
|
||||
430
terraform/litellm/gcp/cloudrun.tf
Normal file
430
terraform/litellm/gcp/cloudrun.tf
Normal file
@ -0,0 +1,430 @@
|
||||
# Three Cloud Run v2 services + one Cloud Run v2 job for migrations.
|
||||
# All four use the same service account and the same VPC connector for
|
||||
# private egress to Cloud SQL + Memorystore.
|
||||
|
||||
locals {
|
||||
# Memorystore exposes a self-signed CA cert per instance; we ship it as
|
||||
# a base64 env var and decode it to a file at container startup so the
|
||||
# rediss:// connection can validate. Public cert, not sensitive.
|
||||
redis_ca_pem_b64 = base64encode(google_redis_instance.this.server_ca_certs[0].cert)
|
||||
|
||||
shared_env_kv = [
|
||||
{ name = "DATABASE_HOST", value = google_sql_database_instance.writer.private_ip_address },
|
||||
{ name = "DATABASE_PORT", value = "5432" },
|
||||
{ name = "DATABASE_USER", value = var.db_username },
|
||||
{ name = "DATABASE_NAME", value = var.db_name },
|
||||
{ name = "DATABASE_HOST_READ_REPLICA", value = google_sql_database_instance.reader.private_ip_address },
|
||||
{ name = "DATABASE_PORT_READ_REPLICA", value = "5432" },
|
||||
{ name = "REDIS_HOST", value = google_redis_instance.this.host },
|
||||
{ name = "REDIS_PORT", value = tostring(google_redis_instance.this.port) },
|
||||
# _redis.get_redis_url_from_environment honors REDIS_SSL to flip the
|
||||
# scheme to rediss://; REDIS_SSL_CA_CERTS is mapped via
|
||||
# _get_redis_env_kwarg_mapping → ssl_ca_certs on the redis-py client.
|
||||
{ name = "REDIS_SSL", value = "true" },
|
||||
{ name = "REDIS_SSL_CA_CERTS", value = "/tmp/redis-ca.pem" },
|
||||
{ name = "REDIS_CA_PEM_B64", value = local.redis_ca_pem_b64 },
|
||||
{ name = "GCS_BUCKET_NAME", value = google_storage_bucket.this.name },
|
||||
]
|
||||
|
||||
# Cloud Run v2 secret env vars use value_source.secret_key_ref pointing at a
|
||||
# secret resource ID. Shared between gateway and backend (the migrations
|
||||
# job has its own narrower env list — see migrations_env_secrets below).
|
||||
shared_env_secrets = concat(
|
||||
[
|
||||
{ name = "LITELLM_MASTER_KEY", secret = google_secret_manager_secret.master_key.id, version = "latest" },
|
||||
{ name = "DATABASE_PASSWORD", secret = google_secret_manager_secret.db_password.id, version = "latest" },
|
||||
],
|
||||
var.litellm_license == "" ? [] : [
|
||||
{ name = "LITELLM_LICENSE", secret = google_secret_manager_secret.license[0].id, version = "latest" },
|
||||
],
|
||||
)
|
||||
|
||||
# Backend-only managed secrets. UI_PASSWORD is consumed by the management
|
||||
# API (UI login flow) and has no use on the gateway data plane.
|
||||
backend_managed_env_secrets = var.ui_password == "" ? [] : [
|
||||
{ name = "UI_PASSWORD", secret = google_secret_manager_secret.ui_password[0].id, version = "latest" },
|
||||
]
|
||||
|
||||
# Per-component extras (from variables).
|
||||
gateway_extra_env_kv = [
|
||||
for k, v in var.gateway_extra_env : { name = k, value = v }
|
||||
]
|
||||
backend_extra_env_kv = [
|
||||
for k, v in var.backend_extra_env : { name = k, value = v }
|
||||
]
|
||||
|
||||
backend_default_env_kv = [
|
||||
{ name = "STORE_MODEL_IN_DB", value = "true" },
|
||||
]
|
||||
gateway_extra_secret_kv = [
|
||||
for k, v in var.gateway_extra_secrets : { name = k, secret = v, version = "latest" }
|
||||
]
|
||||
backend_extra_secret_kv = [
|
||||
for k, v in var.backend_extra_secrets : { name = k, secret = v, version = "latest" }
|
||||
]
|
||||
|
||||
# Shell fragments composed with && so any failure short-circuits the
|
||||
# whole startup instead of falling through to `exec uvicorn`. The
|
||||
# python step is only included when the caller provided a proxy_config.
|
||||
proxy_config_fragment = local.proxy_config_enabled ? [
|
||||
"python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\""
|
||||
] : []
|
||||
|
||||
# Decode the Memorystore CA cert (passed as REDIS_CA_PEM_B64) to the
|
||||
# path REDIS_SSL_CA_CERTS points at, so the redis-py client can validate
|
||||
# the rediss:// handshake.
|
||||
redis_ca_fragment = [
|
||||
"python -c \"import os, base64, pathlib; pathlib.Path(os.environ['REDIS_SSL_CA_CERTS']).write_bytes(base64.b64decode(os.environ['REDIS_CA_PEM_B64']))\""
|
||||
]
|
||||
|
||||
database_url_fragment = [
|
||||
"export DATABASE_URL=\"postgresql://$${DATABASE_USER}:$${DATABASE_PASSWORD}@$${DATABASE_HOST}:$${DATABASE_PORT}/$${DATABASE_NAME}\"",
|
||||
"export DATABASE_URL_READ_REPLICA=\"postgresql://$${DATABASE_USER}:$${DATABASE_PASSWORD}@$${DATABASE_HOST_READ_REPLICA}:$${DATABASE_PORT_READ_REPLICA}/$${DATABASE_NAME}\"",
|
||||
]
|
||||
|
||||
gateway_args = join(" && ", concat(
|
||||
local.proxy_config_fragment,
|
||||
local.redis_ca_fragment,
|
||||
local.database_url_fragment,
|
||||
["exec uvicorn gateway.main:app --host 0.0.0.0 --port 4000"],
|
||||
))
|
||||
|
||||
backend_args = join(" && ", concat(
|
||||
local.proxy_config_fragment,
|
||||
local.redis_ca_fragment,
|
||||
local.database_url_fragment,
|
||||
["exec uvicorn backend.main:app --host 0.0.0.0 --port 4001"],
|
||||
))
|
||||
|
||||
# Env shipped to the migrations Job. The migrations image runs run.py
|
||||
# which assembles DATABASE_URL from these discrete vars itself, so we
|
||||
# only need writer-side DB env (no read replica, no proxy_config, no
|
||||
# master key).
|
||||
migrations_env_kv = [
|
||||
{ name = "DATABASE_HOST", value = google_sql_database_instance.writer.private_ip_address },
|
||||
{ name = "DATABASE_PORT", value = "5432" },
|
||||
{ name = "DATABASE_USER", value = var.db_username },
|
||||
{ name = "DATABASE_NAME", value = var.db_name },
|
||||
]
|
||||
|
||||
migrations_env_secrets = [
|
||||
{ name = "DATABASE_PASSWORD", secret = google_secret_manager_secret.db_password.id, version = "latest" },
|
||||
]
|
||||
}
|
||||
|
||||
# ---------- Gateway ----------
|
||||
resource "google_cloud_run_v2_service" "gateway" {
|
||||
name = "${local.name}-gateway"
|
||||
location = var.region
|
||||
ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
|
||||
|
||||
template {
|
||||
service_account = google_service_account.runtime.email
|
||||
max_instance_request_concurrency = var.gateway_max_instance_request_concurrency
|
||||
|
||||
vpc_access {
|
||||
connector = google_vpc_access_connector.this.id
|
||||
egress = "PRIVATE_RANGES_ONLY"
|
||||
}
|
||||
|
||||
scaling {
|
||||
min_instance_count = var.gateway_min_instances
|
||||
max_instance_count = var.gateway_max_instances
|
||||
}
|
||||
|
||||
containers {
|
||||
image = local.gateway_image
|
||||
command = ["sh", "-c"]
|
||||
args = [local.gateway_args]
|
||||
|
||||
ports {
|
||||
container_port = 4000
|
||||
}
|
||||
|
||||
resources {
|
||||
limits = {
|
||||
cpu = var.gateway_cpu
|
||||
memory = var.gateway_memory
|
||||
}
|
||||
}
|
||||
|
||||
dynamic "env" {
|
||||
for_each = concat(local.shared_env_kv, local.gateway_extra_env_kv, local.proxy_config_env)
|
||||
content {
|
||||
name = env.value.name
|
||||
value = env.value.value
|
||||
}
|
||||
}
|
||||
|
||||
dynamic "env" {
|
||||
for_each = concat(local.shared_env_secrets, local.gateway_extra_secret_kv)
|
||||
content {
|
||||
name = env.value.name
|
||||
value_source {
|
||||
secret_key_ref {
|
||||
secret = env.value.secret
|
||||
version = env.value.version
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
startup_probe {
|
||||
http_get {
|
||||
path = "/health/readiness"
|
||||
port = 4000
|
||||
}
|
||||
initial_delay_seconds = 10
|
||||
period_seconds = 10
|
||||
timeout_seconds = 5
|
||||
failure_threshold = 12
|
||||
}
|
||||
|
||||
liveness_probe {
|
||||
http_get {
|
||||
path = "/health/liveliness"
|
||||
port = 4000
|
||||
}
|
||||
period_seconds = 30
|
||||
timeout_seconds = 5
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
depends_on = [
|
||||
google_secret_manager_secret_iam_member.master_key,
|
||||
google_secret_manager_secret_iam_member.db_password,
|
||||
google_secret_manager_secret_iam_member.license,
|
||||
google_secret_manager_secret_iam_member.extras,
|
||||
google_sql_user.app,
|
||||
# Don't go live until the schema is migrated; otherwise the proxy boots,
|
||||
# fails on missing tables, and Cloud Run keeps cold-restarting.
|
||||
terraform_data.migration,
|
||||
]
|
||||
}
|
||||
|
||||
# ---------- Backend ----------
|
||||
resource "google_cloud_run_v2_service" "backend" {
|
||||
name = "${local.name}-backend"
|
||||
location = var.region
|
||||
ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
|
||||
|
||||
template {
|
||||
service_account = google_service_account.runtime.email
|
||||
max_instance_request_concurrency = var.backend_max_instance_request_concurrency
|
||||
|
||||
vpc_access {
|
||||
connector = google_vpc_access_connector.this.id
|
||||
egress = "PRIVATE_RANGES_ONLY"
|
||||
}
|
||||
|
||||
scaling {
|
||||
min_instance_count = var.backend_min_instances
|
||||
max_instance_count = var.backend_max_instances
|
||||
}
|
||||
|
||||
containers {
|
||||
image = local.backend_image
|
||||
command = ["sh", "-c"]
|
||||
args = [local.backend_args]
|
||||
|
||||
ports {
|
||||
container_port = 4001
|
||||
}
|
||||
|
||||
resources {
|
||||
limits = {
|
||||
cpu = var.backend_cpu
|
||||
memory = var.backend_memory
|
||||
}
|
||||
}
|
||||
|
||||
dynamic "env" {
|
||||
for_each = concat(local.shared_env_kv, local.backend_default_env_kv, local.backend_extra_env_kv, local.proxy_config_env)
|
||||
content {
|
||||
name = env.value.name
|
||||
value = env.value.value
|
||||
}
|
||||
}
|
||||
|
||||
dynamic "env" {
|
||||
for_each = concat(local.shared_env_secrets, local.backend_managed_env_secrets, local.backend_extra_secret_kv)
|
||||
content {
|
||||
name = env.value.name
|
||||
value_source {
|
||||
secret_key_ref {
|
||||
secret = env.value.secret
|
||||
version = env.value.version
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
startup_probe {
|
||||
http_get {
|
||||
path = "/health/readiness"
|
||||
port = 4001
|
||||
}
|
||||
initial_delay_seconds = 10
|
||||
period_seconds = 10
|
||||
timeout_seconds = 5
|
||||
failure_threshold = 12
|
||||
}
|
||||
|
||||
liveness_probe {
|
||||
http_get {
|
||||
path = "/health/liveliness"
|
||||
port = 4001
|
||||
}
|
||||
period_seconds = 30
|
||||
timeout_seconds = 5
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
depends_on = [
|
||||
google_secret_manager_secret_iam_member.master_key,
|
||||
google_secret_manager_secret_iam_member.db_password,
|
||||
google_secret_manager_secret_iam_member.license,
|
||||
google_secret_manager_secret_iam_member.ui_password,
|
||||
google_secret_manager_secret_iam_member.extras,
|
||||
google_sql_user.app,
|
||||
terraform_data.migration,
|
||||
]
|
||||
}
|
||||
|
||||
# ---------- UI ----------
|
||||
# Static nginx — no DB, no Redis, no secrets. Runs as ui_runtime, a SA
|
||||
# with zero IAM bindings, so a compromised UI container can't pivot to
|
||||
# Secret Manager / Cloud SQL via the metadata service.
|
||||
resource "google_cloud_run_v2_service" "ui" {
|
||||
name = "${local.name}-ui"
|
||||
location = var.region
|
||||
ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
|
||||
|
||||
template {
|
||||
service_account = google_service_account.ui_runtime.email
|
||||
max_instance_request_concurrency = var.ui_max_instance_request_concurrency
|
||||
|
||||
scaling {
|
||||
min_instance_count = var.ui_min_instances
|
||||
max_instance_count = var.ui_max_instances
|
||||
}
|
||||
|
||||
containers {
|
||||
image = local.ui_image
|
||||
|
||||
ports {
|
||||
container_port = 3000
|
||||
}
|
||||
|
||||
resources {
|
||||
limits = {
|
||||
cpu = var.ui_cpu
|
||||
memory = var.ui_memory
|
||||
}
|
||||
}
|
||||
|
||||
startup_probe {
|
||||
http_get {
|
||||
path = "/healthz"
|
||||
port = 3000
|
||||
}
|
||||
initial_delay_seconds = 5
|
||||
period_seconds = 10
|
||||
timeout_seconds = 3
|
||||
failure_threshold = 6
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Allow the LB (any unauthenticated traffic from the configured serverless
|
||||
# NEG) to invoke the Cloud Run services. The actual auth is in the proxy
|
||||
# (LITELLM_MASTER_KEY); these IAM bindings just open up Cloud Run's invoker
|
||||
# gate so the LB request makes it to the container.
|
||||
resource "google_cloud_run_v2_service_iam_member" "gateway_allusers" {
|
||||
project = var.project
|
||||
location = google_cloud_run_v2_service.gateway.location
|
||||
name = google_cloud_run_v2_service.gateway.name
|
||||
role = "roles/run.invoker"
|
||||
member = "allUsers"
|
||||
}
|
||||
|
||||
resource "google_cloud_run_v2_service_iam_member" "backend_allusers" {
|
||||
project = var.project
|
||||
location = google_cloud_run_v2_service.backend.location
|
||||
name = google_cloud_run_v2_service.backend.name
|
||||
role = "roles/run.invoker"
|
||||
member = "allUsers"
|
||||
}
|
||||
|
||||
resource "google_cloud_run_v2_service_iam_member" "ui_allusers" {
|
||||
project = var.project
|
||||
location = google_cloud_run_v2_service.ui.location
|
||||
name = google_cloud_run_v2_service.ui.name
|
||||
role = "roles/run.invoker"
|
||||
member = "allUsers"
|
||||
}
|
||||
|
||||
# ---------- Migrations job ----------
|
||||
# Dedicated litellm-migrations image — slim, ENTRYPOINT runs run.py which
|
||||
# assembles DATABASE_URL from the DATABASE_* env vars and runs `prisma
|
||||
# migrate deploy`. No proxy_config, no master key, no shell wrapper.
|
||||
resource "google_cloud_run_v2_job" "migrations" {
|
||||
name = "${local.name}-migrations"
|
||||
location = var.region
|
||||
|
||||
template {
|
||||
template {
|
||||
service_account = google_service_account.runtime.email
|
||||
|
||||
vpc_access {
|
||||
connector = google_vpc_access_connector.this.id
|
||||
egress = "PRIVATE_RANGES_ONLY"
|
||||
}
|
||||
|
||||
containers {
|
||||
image = local.migrations_image
|
||||
|
||||
# Prisma's Node + Rust engine plus the v2 migration resolver
|
||||
# routinely peaks above 1 GiB while applying the schema, so 2 GiB
|
||||
# is the floor — 1 GiB OOM-kills mid-migrate. CPU stays at 1 vCPU
|
||||
# (Cloud Run requires >= 1 with concurrency > 1, and `prisma
|
||||
# migrate deploy` is single-threaded so more buys nothing).
|
||||
resources {
|
||||
limits = {
|
||||
cpu = "1000m"
|
||||
memory = "4Gi"
|
||||
}
|
||||
}
|
||||
|
||||
dynamic "env" {
|
||||
for_each = local.migrations_env_kv
|
||||
content {
|
||||
name = env.value.name
|
||||
value = env.value.value
|
||||
}
|
||||
}
|
||||
|
||||
dynamic "env" {
|
||||
for_each = local.migrations_env_secrets
|
||||
content {
|
||||
name = env.value.name
|
||||
value_source {
|
||||
secret_key_ref {
|
||||
secret = env.value.secret
|
||||
version = env.value.version
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
depends_on = [
|
||||
google_secret_manager_secret_iam_member.db_password,
|
||||
google_sql_user.app,
|
||||
]
|
||||
}
|
||||
102
terraform/litellm/gcp/cloudsql.tf
Normal file
102
terraform/litellm/gcp/cloudsql.tf
Normal file
@ -0,0 +1,102 @@
|
||||
# Cloud SQL for PostgreSQL — one primary + one read replica.
|
||||
#
|
||||
# Note on auth: LiteLLM's IAM-auth helper (rds_iam_token.py) mints AWS RDS
|
||||
# tokens via boto3 and doesn't speak GCP IAM. Cloud SQL IAM auth from Cloud
|
||||
# Run requires the Cloud SQL Auth Proxy as a sidecar, which complicates the
|
||||
# Cloud Run service spec. We instead use password auth: a random password
|
||||
# lives in Secret Manager and is injected into the Cloud Run services as
|
||||
# DATABASE_PASSWORD. The writer's DATABASE_URL is assembled inside the
|
||||
# container at startup; the reader URL is built from the replica's IP.
|
||||
|
||||
resource "google_sql_database_instance" "writer" {
|
||||
name = local.name
|
||||
region = var.region
|
||||
database_version = var.db_version
|
||||
|
||||
depends_on = [google_service_networking_connection.psa]
|
||||
|
||||
settings {
|
||||
# ENTERPRISE accepts the db-custom-* and db-n1-* tiers we default to.
|
||||
# ENTERPRISE_PLUS only accepts db-perf-optimized-* and is ~3x cost — set
|
||||
# var.db_edition = "ENTERPRISE_PLUS" + change var.db_tier together if you
|
||||
# want it.
|
||||
edition = var.db_edition
|
||||
tier = var.db_tier
|
||||
availability_type = "REGIONAL"
|
||||
disk_size = 20
|
||||
disk_autoresize = true
|
||||
|
||||
backup_configuration {
|
||||
enabled = true
|
||||
point_in_time_recovery_enabled = true
|
||||
start_time = "07:00"
|
||||
}
|
||||
|
||||
ip_configuration {
|
||||
ipv4_enabled = false
|
||||
private_network = google_compute_network.this.id
|
||||
}
|
||||
|
||||
insights_config {
|
||||
query_insights_enabled = true
|
||||
record_application_tags = true
|
||||
record_client_address = true
|
||||
}
|
||||
}
|
||||
|
||||
deletion_protection = var.cloudsql_deletion_protection
|
||||
}
|
||||
|
||||
resource "google_sql_database_instance" "reader" {
|
||||
name = "${local.name}-reader"
|
||||
region = var.region
|
||||
database_version = var.db_version
|
||||
master_instance_name = google_sql_database_instance.writer.name
|
||||
|
||||
depends_on = [google_service_networking_connection.psa]
|
||||
|
||||
settings {
|
||||
edition = var.db_edition
|
||||
tier = var.db_tier
|
||||
availability_type = "ZONAL"
|
||||
disk_autoresize = true
|
||||
|
||||
ip_configuration {
|
||||
ipv4_enabled = false
|
||||
private_network = google_compute_network.this.id
|
||||
}
|
||||
}
|
||||
|
||||
deletion_protection = var.cloudsql_deletion_protection
|
||||
}
|
||||
|
||||
resource "google_sql_database" "this" {
|
||||
name = var.db_name
|
||||
instance = google_sql_database_instance.writer.name
|
||||
}
|
||||
|
||||
resource "random_password" "db_password" {
|
||||
length = 32
|
||||
special = false
|
||||
min_lower = 4
|
||||
min_upper = 4
|
||||
min_numeric = 4
|
||||
}
|
||||
|
||||
resource "google_sql_user" "app" {
|
||||
name = var.db_username
|
||||
instance = google_sql_database_instance.writer.name
|
||||
password = random_password.db_password.result
|
||||
}
|
||||
|
||||
resource "google_secret_manager_secret" "db_password" {
|
||||
secret_id = "${local.name}-db-password"
|
||||
replication {
|
||||
auto {}
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_secret_manager_secret_version" "db_password" {
|
||||
secret = google_secret_manager_secret.db_password.id
|
||||
secret_data = random_password.db_password.result
|
||||
}
|
||||
29
terraform/litellm/gcp/gcs.tf
Normal file
29
terraform/litellm/gcp/gcs.tf
Normal file
@ -0,0 +1,29 @@
|
||||
# General-purpose GCS bucket — same role as the AWS S3 bucket. The bucket
|
||||
# name is exposed to gateway + backend as GCS_BUCKET_NAME; reference it
|
||||
# from proxy_config via `os.environ/GCS_BUCKET_NAME`.
|
||||
|
||||
resource "random_id" "bucket_suffix" {
|
||||
byte_length = 4
|
||||
}
|
||||
|
||||
resource "google_storage_bucket" "this" {
|
||||
name = "${var.project}-${local.name}-${random_id.bucket_suffix.hex}"
|
||||
location = var.region
|
||||
uniform_bucket_level_access = true
|
||||
force_destroy = var.gcs_force_destroy
|
||||
|
||||
versioning {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
public_access_prevention = "enforced"
|
||||
|
||||
labels = var.labels
|
||||
}
|
||||
|
||||
# Cloud Run runtime SA gains object admin on this bucket only.
|
||||
resource "google_storage_bucket_iam_member" "runtime" {
|
||||
bucket = google_storage_bucket.this.name
|
||||
role = "roles/storage.objectAdmin"
|
||||
member = "serviceAccount:${google_service_account.runtime.email}"
|
||||
}
|
||||
71
terraform/litellm/gcp/iam.tf
Normal file
71
terraform/litellm/gcp/iam.tf
Normal file
@ -0,0 +1,71 @@
|
||||
# Runtime SA used by the gateway, backend, and migration job — has Cloud
|
||||
# SQL client + Secret Manager accessor on every managed/extra secret. The
|
||||
# UI deliberately uses a *different* SA (below) so a compromised UI
|
||||
# container can't read master_key / db_password / license / ui_password /
|
||||
# provider creds via the metadata service.
|
||||
resource "google_service_account" "runtime" {
|
||||
account_id = "${local.name}-runtime"
|
||||
display_name = "LiteLLM Cloud Run runtime"
|
||||
}
|
||||
|
||||
# UI runtime SA — no role bindings. The UI is static nginx with no DB,
|
||||
# Redis, or Secret Manager dependencies, so its task identity should not
|
||||
# be able to read any of those. Cloud Run pulls the UI image via the
|
||||
# project's serverless service agent (not this SA), so it doesn't need
|
||||
# artifactregistry.reader either.
|
||||
resource "google_service_account" "ui_runtime" {
|
||||
account_id = "${local.name}-ui-runtime"
|
||||
display_name = "LiteLLM Cloud Run UI runtime (no data-plane access)"
|
||||
}
|
||||
|
||||
# Cloud SQL client — lets the Cloud Run services connect to the instance
|
||||
# over private IP via the VPC connector.
|
||||
resource "google_project_iam_member" "runtime_cloudsql" {
|
||||
project = var.project
|
||||
role = "roles/cloudsql.client"
|
||||
member = "serviceAccount:${google_service_account.runtime.email}"
|
||||
}
|
||||
|
||||
# Secret Manager accessor — managed secrets first (split out as separate
|
||||
# resources because their IDs are computed-at-apply and can't drive a
|
||||
# for_each).
|
||||
resource "google_secret_manager_secret_iam_member" "master_key" {
|
||||
secret_id = google_secret_manager_secret.master_key.id
|
||||
role = "roles/secretmanager.secretAccessor"
|
||||
member = "serviceAccount:${google_service_account.runtime.email}"
|
||||
}
|
||||
|
||||
resource "google_secret_manager_secret_iam_member" "db_password" {
|
||||
secret_id = google_secret_manager_secret.db_password.id
|
||||
role = "roles/secretmanager.secretAccessor"
|
||||
member = "serviceAccount:${google_service_account.runtime.email}"
|
||||
}
|
||||
|
||||
# License secret accessor — only created when var.litellm_license is set.
|
||||
resource "google_secret_manager_secret_iam_member" "license" {
|
||||
count = var.litellm_license == "" ? 0 : 1
|
||||
|
||||
secret_id = google_secret_manager_secret.license[0].id
|
||||
role = "roles/secretmanager.secretAccessor"
|
||||
member = "serviceAccount:${google_service_account.runtime.email}"
|
||||
}
|
||||
|
||||
# UI password secret accessor — only created when var.ui_password is set.
|
||||
resource "google_secret_manager_secret_iam_member" "ui_password" {
|
||||
count = var.ui_password == "" ? 0 : 1
|
||||
|
||||
secret_id = google_secret_manager_secret.ui_password[0].id
|
||||
role = "roles/secretmanager.secretAccessor"
|
||||
member = "serviceAccount:${google_service_account.runtime.email}"
|
||||
}
|
||||
|
||||
# User-supplied extras. Dedupe on the secret resource ID — two different
|
||||
# env-var names could reference the same secret, and we want exactly one
|
||||
# IAM binding per (secret, role, member) tuple in state.
|
||||
resource "google_secret_manager_secret_iam_member" "extras" {
|
||||
for_each = toset(values(merge(var.gateway_extra_secrets, var.backend_extra_secrets)))
|
||||
|
||||
secret_id = each.value
|
||||
role = "roles/secretmanager.secretAccessor"
|
||||
member = "serviceAccount:${google_service_account.runtime.email}"
|
||||
}
|
||||
185
terraform/litellm/gcp/load_balancer.tf
Normal file
185
terraform/litellm/gcp/load_balancer.tf
Normal file
@ -0,0 +1,185 @@
|
||||
# External global HTTP(S) load balancer fronting all three Cloud Run
|
||||
# services. URL map mirrors the helm-chart ingress path routing:
|
||||
# - LLM data-plane paths → gateway
|
||||
# - UI asset paths → ui
|
||||
# - Everything else → backend (management API: /key/*, /user/*, …)
|
||||
#
|
||||
# By default the LB serves plain HTTP on port 80. Set var.lb_domains to a
|
||||
# list of DNS names already pointing at lb_ip and the stack provisions a
|
||||
# Google-managed SSL cert + 443 forwarding rule, and the 80 forwarding rule
|
||||
# is rewritten to redirect HTTP→HTTPS via a redirect-only URL map.
|
||||
|
||||
locals {
|
||||
tls_enabled = length(var.lb_domains) > 0
|
||||
}
|
||||
|
||||
resource "google_compute_global_address" "lb" {
|
||||
name = "${local.name}-lb-ip"
|
||||
}
|
||||
|
||||
# Serverless NEGs — one per Cloud Run service.
|
||||
resource "google_compute_region_network_endpoint_group" "gateway" {
|
||||
name = "${local.name}-gateway-neg"
|
||||
region = var.region
|
||||
network_endpoint_type = "SERVERLESS"
|
||||
|
||||
cloud_run {
|
||||
service = google_cloud_run_v2_service.gateway.name
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_compute_region_network_endpoint_group" "backend" {
|
||||
name = "${local.name}-backend-neg"
|
||||
region = var.region
|
||||
network_endpoint_type = "SERVERLESS"
|
||||
|
||||
cloud_run {
|
||||
service = google_cloud_run_v2_service.backend.name
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_compute_region_network_endpoint_group" "ui" {
|
||||
name = "${local.name}-ui-neg"
|
||||
region = var.region
|
||||
network_endpoint_type = "SERVERLESS"
|
||||
|
||||
cloud_run {
|
||||
service = google_cloud_run_v2_service.ui.name
|
||||
}
|
||||
}
|
||||
|
||||
# Backend services wrap each NEG.
|
||||
resource "google_compute_backend_service" "gateway" {
|
||||
name = "${local.name}-gateway-bs"
|
||||
protocol = "HTTP"
|
||||
load_balancing_scheme = "EXTERNAL_MANAGED"
|
||||
|
||||
backend {
|
||||
group = google_compute_region_network_endpoint_group.gateway.id
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_compute_backend_service" "backend" {
|
||||
name = "${local.name}-backend-bs"
|
||||
protocol = "HTTP"
|
||||
load_balancing_scheme = "EXTERNAL_MANAGED"
|
||||
|
||||
backend {
|
||||
group = google_compute_region_network_endpoint_group.backend.id
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_compute_backend_service" "ui" {
|
||||
name = "${local.name}-ui-bs"
|
||||
protocol = "HTTP"
|
||||
load_balancing_scheme = "EXTERNAL_MANAGED"
|
||||
|
||||
backend {
|
||||
group = google_compute_region_network_endpoint_group.ui.id
|
||||
}
|
||||
}
|
||||
|
||||
# URL map. Default → backend (management API). Path matchers route the
|
||||
# gateway and UI prefixes elsewhere.
|
||||
resource "google_compute_url_map" "this" {
|
||||
name = local.name
|
||||
default_service = google_compute_backend_service.backend.id
|
||||
|
||||
host_rule {
|
||||
hosts = ["*"]
|
||||
path_matcher = "main"
|
||||
}
|
||||
|
||||
path_matcher {
|
||||
name = "main"
|
||||
default_service = google_compute_backend_service.backend.id
|
||||
|
||||
# UI paths (catch them before any /v1/* gateway rules so /favicon.ico
|
||||
# and / take precedence).
|
||||
path_rule {
|
||||
paths = local.ui_path_prefixes
|
||||
service = google_compute_backend_service.ui.id
|
||||
}
|
||||
|
||||
# Gateway path prefixes. GCP URL maps cap a path_rule at 10 path globs,
|
||||
# so chunk into rules of 10.
|
||||
dynamic "path_rule" {
|
||||
for_each = { for idx, chunk in chunklist(local.gateway_path_prefixes, 10) : idx => chunk }
|
||||
content {
|
||||
paths = path_rule.value
|
||||
service = google_compute_backend_service.gateway.id
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Permanent HTTP→HTTPS redirect URL map. Only attached to the port-80
|
||||
# target proxy when TLS is enabled; otherwise the regular path-routing
|
||||
# URL map is attached to the HTTP proxy and everything stays plaintext.
|
||||
resource "google_compute_url_map" "https_redirect" {
|
||||
count = local.tls_enabled ? 1 : 0
|
||||
name = "${local.name}-redirect"
|
||||
|
||||
default_url_redirect {
|
||||
https_redirect = true
|
||||
redirect_response_code = "MOVED_PERMANENTLY_DEFAULT"
|
||||
strip_query = false
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_compute_target_http_proxy" "this" {
|
||||
name = "${local.name}-http"
|
||||
url_map = local.tls_enabled ? google_compute_url_map.https_redirect[0].id : google_compute_url_map.this.id
|
||||
|
||||
# Default-deny on the HTTP-only path: TLS is the supported posture.
|
||||
# Operators must either supply DNS names or explicitly opt in.
|
||||
lifecycle {
|
||||
precondition {
|
||||
condition = local.tls_enabled || var.allow_plaintext_lb
|
||||
error_message = "LB has no HTTPS forwarding rule. Either set `lb_domains` to a list of DNS names you want a Google-managed cert for, or set `allow_plaintext_lb = true` to opt into HTTP-only (trial / dev only)."
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_compute_global_forwarding_rule" "http" {
|
||||
name = "${local.name}-http"
|
||||
ip_protocol = "TCP"
|
||||
port_range = "80"
|
||||
load_balancing_scheme = "EXTERNAL_MANAGED"
|
||||
ip_address = google_compute_global_address.lb.address
|
||||
target = google_compute_target_http_proxy.this.id
|
||||
}
|
||||
|
||||
# ---------- HTTPS (gated on var.lb_domains) ----------
|
||||
#
|
||||
# Google-managed certs require each listed domain to resolve to lb_ip
|
||||
# *before* the cert provisions; on first apply the cert sits in
|
||||
# PROVISIONING for ~15-60 min until DNS propagates. The LB starts serving
|
||||
# 443 immediately, but cert handshakes fail until the managed cert
|
||||
# transitions to ACTIVE.
|
||||
|
||||
resource "google_compute_managed_ssl_certificate" "this" {
|
||||
count = local.tls_enabled ? 1 : 0
|
||||
name = "${local.name}-cert"
|
||||
|
||||
managed {
|
||||
domains = var.lb_domains
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_compute_target_https_proxy" "this" {
|
||||
count = local.tls_enabled ? 1 : 0
|
||||
name = "${local.name}-https"
|
||||
url_map = google_compute_url_map.this.id
|
||||
ssl_certificates = [google_compute_managed_ssl_certificate.this[0].id]
|
||||
}
|
||||
|
||||
resource "google_compute_global_forwarding_rule" "https" {
|
||||
count = local.tls_enabled ? 1 : 0
|
||||
name = "${local.name}-https"
|
||||
ip_protocol = "TCP"
|
||||
port_range = "443"
|
||||
load_balancing_scheme = "EXTERNAL_MANAGED"
|
||||
ip_address = google_compute_global_address.lb.address
|
||||
target = google_compute_target_https_proxy.this[0].id
|
||||
}
|
||||
79
terraform/litellm/gcp/locals.tf
Normal file
79
terraform/litellm/gcp/locals.tf
Normal file
@ -0,0 +1,79 @@
|
||||
# Gateway path prefixes — mirrored verbatim from gateway/routes/allowlist.py
|
||||
# and helm/litellm/templates/ingress.yaml. URL maps use the "path matcher"
|
||||
# rule with `paths` lists; up to 10 path globs per rule, up to 50 rules
|
||||
# per matcher. Easily fits the gateway list in one rule per chunk-of-10.
|
||||
locals {
|
||||
# Every resource the stack creates is named `${tenant}-litellm-${env}`
|
||||
# (or that with a per-resource suffix). Computed once here so the rest of
|
||||
# the stack can reference local.name.
|
||||
name = "${var.tenant}-litellm-${var.env}"
|
||||
|
||||
gateway_path_prefixes = [
|
||||
"/v1/chat/*", "/chat/*",
|
||||
"/v1/completions*", "/completions*",
|
||||
"/v1/embeddings*", "/embeddings*",
|
||||
"/v1/moderations*", "/moderations*",
|
||||
"/v1/audio/*", "/audio/*",
|
||||
"/v1/images/*", "/images/*",
|
||||
"/v1/files*", "/files*",
|
||||
"/v1/batches*", "/batches*",
|
||||
"/v1/fine_tuning/*", "/fine_tuning/*",
|
||||
"/v1/fine-tuning/*", "/fine-tuning/*",
|
||||
"/v1/responses*", "/responses*",
|
||||
"/v1/threads*", "/threads*",
|
||||
"/v1/assistants*", "/assistants*",
|
||||
"/v1/vector_stores*", "/vector_stores*",
|
||||
"/v1/indexes*",
|
||||
"/v1/models*", "/models*",
|
||||
"/openai/*", "/engines/*",
|
||||
"/v1/messages*", "/messages*",
|
||||
"/v1/skills/*", "/v1/a2a/*",
|
||||
"/v1/rerank*", "/v2/rerank*", "/rerank*",
|
||||
"/v1/ocr*", "/ocr*",
|
||||
"/v1/rag/*", "/rag/*",
|
||||
"/v1/video/*", "/v1/videos/*", "/video/*", "/videos/*",
|
||||
"/v1/search*", "/search*",
|
||||
"/v1/containers/*", "/containers/*",
|
||||
"/v1/evals/*",
|
||||
"/v1/memory/*",
|
||||
"/queue/chat/*",
|
||||
"/v1beta/*",
|
||||
"/interactions/*",
|
||||
"/anthropic/*", "/azure/*", "/azure_ai/*", "/aws/*", "/bedrock/*",
|
||||
"/cohere/*", "/gemini/*", "/google/*",
|
||||
"/vertex_ai/*", "/vertex-ai/*",
|
||||
"/assemblyai/*", "/eu.assemblyai/*",
|
||||
"/langfuse/*", "/vllm/*",
|
||||
"/mistral/*", "/groq/*", "/voyage/*", "/cursor/*", "/milvus/*",
|
||||
"/openai_passthrough/*",
|
||||
"/toolset/*",
|
||||
"/v1/realtime*", "/realtime*",
|
||||
"/health*", "/metrics", "/test*",
|
||||
]
|
||||
|
||||
ui_path_prefixes = [
|
||||
"/",
|
||||
"/favicon.ico",
|
||||
"/litellm-asset-prefix/*",
|
||||
"/_next/*",
|
||||
"/assets/*",
|
||||
"/ui",
|
||||
"/ui/*",
|
||||
]
|
||||
|
||||
proxy_config_enabled = length(keys(var.proxy_config)) > 0
|
||||
proxy_config_b64 = local.proxy_config_enabled ? base64encode(yamlencode(var.proxy_config)) : ""
|
||||
|
||||
proxy_config_env = local.proxy_config_enabled ? [
|
||||
{ name = "LITELLM_PROXY_CONFIG_B64", value = local.proxy_config_b64 },
|
||||
{ name = "CONFIG_FILE_PATH", value = "/tmp/litellm-config.yaml" },
|
||||
] : []
|
||||
|
||||
# Resolved image URIs: per-component override wins, otherwise compose
|
||||
# from image_registry + image_tag. Cloud Run only accepts AR / gcr.io /
|
||||
# docker.io paths — see variables.tf for the full constraint list.
|
||||
gateway_image = var.gateway_image != "" ? var.gateway_image : "${var.image_registry}/litellm-gateway:${var.image_tag}"
|
||||
backend_image = var.backend_image != "" ? var.backend_image : "${var.image_registry}/litellm-backend:${var.image_tag}"
|
||||
ui_image = var.ui_image != "" ? var.ui_image : "${var.image_registry}/litellm-ui:${var.image_tag}"
|
||||
migrations_image = var.migrations_image != "" ? var.migrations_image : "${var.image_registry}/litellm-migrations:${var.image_tag}"
|
||||
}
|
||||
46
terraform/litellm/gcp/network.tf
Normal file
46
terraform/litellm/gcp/network.tf
Normal file
@ -0,0 +1,46 @@
|
||||
resource "google_compute_network" "this" {
|
||||
name = local.name
|
||||
auto_create_subnetworks = false
|
||||
routing_mode = "REGIONAL"
|
||||
}
|
||||
|
||||
resource "google_compute_subnetwork" "this" {
|
||||
name = "${local.name}-${var.region}"
|
||||
region = var.region
|
||||
network = google_compute_network.this.id
|
||||
ip_cidr_range = var.subnet_cidr
|
||||
private_ip_google_access = true
|
||||
}
|
||||
|
||||
# Private Services Access (PSA) range for Cloud SQL + Memorystore. Both
|
||||
# managed services peer with the VPC over the connection below using
|
||||
# addresses from this range.
|
||||
resource "google_compute_global_address" "psa" {
|
||||
name = "${local.name}-psa"
|
||||
purpose = "VPC_PEERING"
|
||||
address_type = "INTERNAL"
|
||||
prefix_length = 16
|
||||
network = google_compute_network.this.id
|
||||
}
|
||||
|
||||
resource "google_service_networking_connection" "psa" {
|
||||
network = google_compute_network.this.id
|
||||
service = "servicenetworking.googleapis.com"
|
||||
reserved_peering_ranges = [google_compute_global_address.psa.name]
|
||||
}
|
||||
|
||||
# Serverless VPC Access connector — required so Cloud Run can reach
|
||||
# Cloud SQL / Memorystore private IPs via the PSA range.
|
||||
#
|
||||
# min/max instances are required by the API now (you can't just set
|
||||
# machine_type alone). Defaults: 2 e2-micro instances scale up to 3 — fine
|
||||
# for low-to-moderate Cloud Run egress; bump max if your services push
|
||||
# heavy private-network traffic.
|
||||
resource "google_vpc_access_connector" "this" {
|
||||
name = "${local.name}-conn"
|
||||
region = var.region
|
||||
network = google_compute_network.this.name
|
||||
ip_cidr_range = var.vpc_connector_cidr
|
||||
min_instances = 2
|
||||
max_instances = 3
|
||||
}
|
||||
64
terraform/litellm/gcp/outputs.tf
Normal file
64
terraform/litellm/gcp/outputs.tf
Normal file
@ -0,0 +1,64 @@
|
||||
output "lb_ip" {
|
||||
description = "Global anycast IP of the external HTTPS load balancer."
|
||||
value = google_compute_global_address.lb.address
|
||||
}
|
||||
|
||||
output "lb_url" {
|
||||
description = "Proxy URL. Switches scheme based on whether lb_domains is set; when TLS is enabled the URL points at the first listed domain (since managed certs are tied to the hostname, not the anycast IP). The dashboard is served at /, the API at /v1/*."
|
||||
value = local.tls_enabled ? "https://${var.lb_domains[0]}" : "http://${google_compute_global_address.lb.address}"
|
||||
}
|
||||
|
||||
output "gateway_service_url" {
|
||||
description = "Default Cloud Run URL for the gateway (bypasses the LB)."
|
||||
value = google_cloud_run_v2_service.gateway.uri
|
||||
}
|
||||
|
||||
output "backend_service_url" {
|
||||
description = "Default Cloud Run URL for the backend (bypasses the LB)."
|
||||
value = google_cloud_run_v2_service.backend.uri
|
||||
}
|
||||
|
||||
output "ui_service_url" {
|
||||
description = "Default Cloud Run URL for the UI (bypasses the LB)."
|
||||
value = google_cloud_run_v2_service.ui.uri
|
||||
}
|
||||
|
||||
output "cloudsql_writer_ip" {
|
||||
description = "Private IP of the Cloud SQL writer."
|
||||
value = google_sql_database_instance.writer.private_ip_address
|
||||
}
|
||||
|
||||
output "cloudsql_reader_ip" {
|
||||
description = "Private IP of the Cloud SQL read replica."
|
||||
value = google_sql_database_instance.reader.private_ip_address
|
||||
}
|
||||
|
||||
output "redis_endpoint" {
|
||||
description = "Memorystore Redis endpoint."
|
||||
value = "${google_redis_instance.this.host}:${google_redis_instance.this.port}"
|
||||
}
|
||||
|
||||
output "gcs_bucket" {
|
||||
description = "GCS bucket name. Exposed to gateway + backend as GCS_BUCKET_NAME. Reference from proxy_config via `os.environ/GCS_BUCKET_NAME`."
|
||||
value = google_storage_bucket.this.name
|
||||
}
|
||||
|
||||
output "master_key_secret_id" {
|
||||
description = "Secret Manager resource ID holding LITELLM_MASTER_KEY. Fetch with `gcloud secrets versions access latest --secret=<id>`."
|
||||
value = google_secret_manager_secret.master_key.secret_id
|
||||
}
|
||||
|
||||
output "db_password_secret_id" {
|
||||
description = "Secret Manager resource ID holding the Cloud SQL app-user password."
|
||||
value = google_secret_manager_secret.db_password.secret_id
|
||||
}
|
||||
|
||||
output "migration_run_command" {
|
||||
description = "Shell command that executes the one-off migration job against Cloud SQL. Run this once after the first apply."
|
||||
value = format(
|
||||
"gcloud run jobs execute %s --region %s --project %s --wait",
|
||||
google_cloud_run_v2_job.migrations.name,
|
||||
var.region,
|
||||
var.project,
|
||||
)
|
||||
}
|
||||
9
terraform/litellm/gcp/providers.tf
Normal file
9
terraform/litellm/gcp/providers.tf
Normal file
@ -0,0 +1,9 @@
|
||||
provider "google" {
|
||||
project = var.project
|
||||
region = var.region
|
||||
}
|
||||
|
||||
provider "google-beta" {
|
||||
project = var.project
|
||||
region = var.region
|
||||
}
|
||||
20
terraform/litellm/gcp/redis.tf
Normal file
20
terraform/litellm/gcp/redis.tf
Normal file
@ -0,0 +1,20 @@
|
||||
resource "google_redis_instance" "this" {
|
||||
name = local.name
|
||||
tier = var.redis_tier
|
||||
memory_size_gb = var.redis_memory_size_gb
|
||||
region = var.region
|
||||
|
||||
authorized_network = google_compute_network.this.id
|
||||
connect_mode = "PRIVATE_SERVICE_ACCESS"
|
||||
|
||||
redis_version = "REDIS_7_0"
|
||||
|
||||
# In-transit encryption between Cloud Run and Memorystore. The instance
|
||||
# exposes its self-signed CA via `server_ca_certs` (read in cloudrun.tf
|
||||
# and passed to the proxy as REDIS_CA_PEM_B64); the proxy decodes it to
|
||||
# /tmp/redis-ca.pem at startup and uses it to validate the rediss://
|
||||
# handshake. Mirrors `transit_encryption_enabled = true` on AWS.
|
||||
transit_encryption_mode = "SERVER_AUTHENTICATION"
|
||||
|
||||
depends_on = [google_service_networking_connection.psa]
|
||||
}
|
||||
62
terraform/litellm/gcp/secrets.tf
Normal file
62
terraform/litellm/gcp/secrets.tf
Normal file
@ -0,0 +1,62 @@
|
||||
resource "random_password" "master_key" {
|
||||
length = 48
|
||||
special = false
|
||||
min_lower = 4
|
||||
min_upper = 4
|
||||
min_numeric = 4
|
||||
}
|
||||
|
||||
# LITELLM_MASTER_KEY (sk-…) lives in Secret Manager. The Cloud Run service
|
||||
# account gets accessor permission on it (see iam.tf).
|
||||
resource "google_secret_manager_secret" "master_key" {
|
||||
secret_id = "${local.name}-master-key"
|
||||
replication {
|
||||
auto {}
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_secret_manager_secret_version" "master_key" {
|
||||
secret = google_secret_manager_secret.master_key.id
|
||||
# When the operator passes litellm_master_key, use it verbatim. Otherwise
|
||||
# fall back to the auto-generated `sk-…` value (trial / OSS path).
|
||||
secret_data = coalesce(var.litellm_master_key, "sk-${random_password.master_key.result}")
|
||||
}
|
||||
|
||||
# LITELLM_LICENSE — only created when the operator supplies one. The runtime
|
||||
# SA gets accessor permission via iam.tf, and gateway + backend pick it up
|
||||
# through shared_env_secrets in cloudrun.tf.
|
||||
resource "google_secret_manager_secret" "license" {
|
||||
count = var.litellm_license == "" ? 0 : 1
|
||||
|
||||
secret_id = "${local.name}-license"
|
||||
replication {
|
||||
auto {}
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_secret_manager_secret_version" "license" {
|
||||
count = var.litellm_license == "" ? 0 : 1
|
||||
|
||||
secret = google_secret_manager_secret.license[0].id
|
||||
secret_data = var.litellm_license
|
||||
}
|
||||
|
||||
# UI_PASSWORD — backend-only. Same pattern as license: only created when
|
||||
# the operator supplies one. The runtime SA gets accessor permission via
|
||||
# iam.tf, and the backend service picks the env var up through
|
||||
# backend_managed_env_secrets in cloudrun.tf.
|
||||
resource "google_secret_manager_secret" "ui_password" {
|
||||
count = var.ui_password == "" ? 0 : 1
|
||||
|
||||
secret_id = "${local.name}-ui-password"
|
||||
replication {
|
||||
auto {}
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_secret_manager_secret_version" "ui_password" {
|
||||
count = var.ui_password == "" ? 0 : 1
|
||||
|
||||
secret = google_secret_manager_secret.ui_password[0].id
|
||||
secret_data = var.ui_password
|
||||
}
|
||||
77
terraform/litellm/gcp/terraform.tfvars.example
Normal file
77
terraform/litellm/gcp/terraform.tfvars.example
Normal file
@ -0,0 +1,77 @@
|
||||
project = "my-gcp-project"
|
||||
region = "us-central1"
|
||||
|
||||
# Resource naming: every GCP resource the stack creates is named
|
||||
# `${tenant}-litellm-${env}` (or that plus a per-resource suffix). E.g.
|
||||
# tenant="acme" + env="stage" → Cloud Run service `acme-litellm-stage-gateway`,
|
||||
# Cloud SQL instance `acme-litellm-stage`, etc.
|
||||
tenant = "acme"
|
||||
env = "stage"
|
||||
|
||||
# Tenant-supplied secrets. Prefer TF_VAR_litellm_master_key /
|
||||
# TF_VAR_litellm_license / TF_VAR_ui_password env vars so the values don't
|
||||
# end up in a committed tfvars file. All three are optional — when
|
||||
# omitted the stack auto-generates a master key, runs without a license,
|
||||
# and falls back to LITELLM_MASTER_KEY for UI login.
|
||||
# litellm_master_key = "sk-..."
|
||||
# litellm_license = "lic-..."
|
||||
# ui_password = "..."
|
||||
|
||||
# TLS: provide DNS names already pointing at the LB IP for a Google-managed
|
||||
# cert. Without one, plan fails unless allow_plaintext_lb = true is set
|
||||
# explicitly (trial/dev only).
|
||||
# lb_domains = ["proxy.example.com"]
|
||||
# allow_plaintext_lb = true
|
||||
|
||||
# Storage and database retention. Defaults are safe — destroy preserves
|
||||
# data. Flip these only for ephemeral / CI stacks.
|
||||
# cloudsql_deletion_protection = true # default: refuse destroy on the DB
|
||||
# gcs_force_destroy = false # default: refuse destroy on a non-empty bucket
|
||||
|
||||
# Component images. Defaults pin all four to the same GHCR release tag —
|
||||
# bump them together when bumping LiteLLM. To use private images, mirror
|
||||
# them into Artifact Registry first — Cloud Run only authenticates against
|
||||
# AR / gcr.io.
|
||||
# gateway_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/gateway:1.86.0-dev"
|
||||
# backend_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/backend:1.86.0-dev"
|
||||
# ui_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/ui:1.86.0-dev"
|
||||
# migrations_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/migrations:1.86.0-dev"
|
||||
|
||||
# ---------- proxy_config (mirrors helm gateway.config.proxy_config) ----------
|
||||
# proxy_config = {
|
||||
# model_list = [
|
||||
# {
|
||||
# model_name = "gpt-4o"
|
||||
# litellm_params = {
|
||||
# model = "openai/gpt-4o"
|
||||
# api_key = "os.environ/OPENAI_API_KEY"
|
||||
# }
|
||||
# },
|
||||
# ]
|
||||
# general_settings = {
|
||||
# master_key = "os.environ/LITELLM_MASTER_KEY"
|
||||
# database_url = "os.environ/DATABASE_URL"
|
||||
# }
|
||||
# }
|
||||
|
||||
# ---------- Extra env / secrets ----------
|
||||
# Plain-text env vars (non-sensitive). Land directly in the Cloud Run service spec.
|
||||
# gateway_extra_env = {
|
||||
# LANGFUSE_HOST = "https://us.cloud.langfuse.com"
|
||||
# }
|
||||
|
||||
# Backend env vars commonly tuned in prod: SSO redirect, docs branding,
|
||||
# UI admin username. UI_PASSWORD is its own first-class var (see top).
|
||||
# backend_extra_env = {
|
||||
# AUTO_REDIRECT_UI_LOGIN_TO_SSO = "true"
|
||||
# DOCS_TITLE = "Acme LiteLLM"
|
||||
# UI_USERNAME = "admin"
|
||||
# }
|
||||
|
||||
# Provider API keys — Secret Manager resource IDs (NOT secret values). The
|
||||
# Cloud Run SA auto-gains roles/secretmanager.secretAccessor on every
|
||||
# secret listed here. Same shape works for backend_extra_secrets.
|
||||
# gateway_extra_secrets = {
|
||||
# OPENAI_API_KEY = "projects/my-gcp-project/secrets/openai-api-key"
|
||||
# ANTHROPIC_API_KEY = "projects/my-gcp-project/secrets/anthropic-api-key"
|
||||
# }
|
||||
405
terraform/litellm/gcp/variables.tf
Normal file
405
terraform/litellm/gcp/variables.tf
Normal file
@ -0,0 +1,405 @@
|
||||
variable "project" {
|
||||
description = "GCP project ID."
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
description = "GCP region for VPC, Cloud SQL, Memorystore, Cloud Run, and the LB IP."
|
||||
type = string
|
||||
default = "us-central1"
|
||||
}
|
||||
|
||||
variable "tenant" {
|
||||
description = "Tenant slug — used as the prefix for every GCP resource the stack creates. Combined with var.env to form `<tenant>-litellm-<env>` (e.g. `acme-litellm-stage`)."
|
||||
type = string
|
||||
|
||||
validation {
|
||||
condition = can(regex("^[a-z][a-z0-9-]{0,20}$", var.tenant))
|
||||
error_message = "tenant must be 1-21 chars, lower-kebab-case, starting with a letter."
|
||||
}
|
||||
}
|
||||
|
||||
variable "env" {
|
||||
description = "Environment suffix appended to every resource name (e.g. `stage`, `prod`, `dev`)."
|
||||
type = string
|
||||
|
||||
validation {
|
||||
condition = can(regex("^[a-z][a-z0-9-]{0,8}$", var.env))
|
||||
error_message = "env must be 1-9 chars, lower-kebab-case, starting with a letter."
|
||||
}
|
||||
}
|
||||
|
||||
variable "labels" {
|
||||
description = "Resource labels merged into every label-supporting resource."
|
||||
type = map(string)
|
||||
default = {
|
||||
"managed-by" = "terraform"
|
||||
}
|
||||
}
|
||||
|
||||
# ---------- Tenant-supplied secrets ----------
|
||||
#
|
||||
# Both default to "" so the stack stays usable for trial / OSS deploys.
|
||||
# Set via TF_VAR_litellm_master_key / TF_VAR_litellm_license to keep the
|
||||
# values out of state files committed to a VCS.
|
||||
|
||||
variable "litellm_master_key" {
|
||||
description = <<-EOT
|
||||
Pre-existing LITELLM_MASTER_KEY (must begin with `sk-`). When set, this
|
||||
value is written to the master-key Secret Manager entry. When empty,
|
||||
the stack auto-generates a random `sk-…` key (preserving today's
|
||||
trial-deploy behavior).
|
||||
EOT
|
||||
type = string
|
||||
default = ""
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "litellm_license" {
|
||||
description = <<-EOT
|
||||
LiteLLM enterprise license string. When set, the stack creates a
|
||||
`<tenant>-litellm-<env>-license` Secret Manager entry, grants the
|
||||
runtime SA accessor on it, and exposes its value to gateway + backend
|
||||
as `LITELLM_LICENSE`. Leave empty for OSS-only deploys.
|
||||
EOT
|
||||
type = string
|
||||
default = ""
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "ui_password" {
|
||||
description = <<-EOT
|
||||
UI admin password. When set, the stack creates a
|
||||
`<tenant>-litellm-<env>-ui-password` Secret Manager entry, grants the
|
||||
runtime SA accessor on it, and exposes its value to the backend as
|
||||
`UI_PASSWORD`. Pair with `backend_extra_env.UI_USERNAME` to set the
|
||||
matching username. Leave empty to skip — the proxy then falls back to
|
||||
the LITELLM_MASTER_KEY for UI login.
|
||||
EOT
|
||||
type = string
|
||||
default = ""
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
# ---------- Networking ----------
|
||||
|
||||
variable "subnet_cidr" {
|
||||
description = "Primary CIDR block for the LiteLLM subnet."
|
||||
type = string
|
||||
default = "10.40.0.0/16"
|
||||
}
|
||||
|
||||
variable "vpc_connector_cidr" {
|
||||
description = "CIDR for the Serverless VPC Access connector. /28 required."
|
||||
type = string
|
||||
default = "10.41.0.0/28"
|
||||
}
|
||||
|
||||
# ---------- Component images ----------
|
||||
#
|
||||
# Cloud Run only pulls from Artifact Registry, [region.]gcr.io, or
|
||||
# docker.io — it rejects arbitrary registries (notably ghcr.io) at apply
|
||||
# time. The four images live on GHCR upstream, so any real deploy must
|
||||
# either set `image_registry` to an Artifact Registry remote repository
|
||||
# pointed at ghcr.io (e.g. `us-central1-docker.pkg.dev/my-proj/litellm/berriai`)
|
||||
# or override the per-component `*_image` vars individually with full URIs.
|
||||
|
||||
variable "image_registry" {
|
||||
description = <<-EOT
|
||||
Registry path prefix used to compose the four LiteLLM image URIs as
|
||||
`<image_registry>/litellm-<component>:<image_tag>`. The default
|
||||
(`ghcr.io/berriai`) only works on registries Cloud Run accepts — for
|
||||
GHCR-backed deploys, create an Artifact Registry remote repository
|
||||
pointed at `https://ghcr.io` and set this to that repo's path
|
||||
(e.g. `us-central1-docker.pkg.dev/<project>/<remote-repo>/berriai`).
|
||||
Per-component overrides (`gateway_image`, `backend_image`, `ui_image`,
|
||||
`migrations_image`) bypass this entirely when set.
|
||||
EOT
|
||||
type = string
|
||||
default = "ghcr.io/berriai"
|
||||
}
|
||||
|
||||
variable "image_tag" {
|
||||
description = "Tag applied to all four litellm-* images when composed from `image_registry`. Bump in lockstep when bumping LiteLLM. Must match a tag actually published to GHCR — the split images use the `v`-prefixed semver convention (e.g. `v1.86.0-dev`)."
|
||||
type = string
|
||||
default = "v1.86.0-dev"
|
||||
}
|
||||
|
||||
variable "gateway_image" {
|
||||
description = "Full image URI for the gateway. Empty (default) composes from `image_registry` + `image_tag`. Public images or Artifact Registry only — Cloud Run won't authenticate against arbitrary private registries."
|
||||
type = string
|
||||
default = ""
|
||||
}
|
||||
|
||||
variable "backend_image" {
|
||||
description = "Full image URI for the backend. Empty (default) composes from `image_registry` + `image_tag`."
|
||||
type = string
|
||||
default = ""
|
||||
}
|
||||
|
||||
variable "ui_image" {
|
||||
description = "Full image URI for the UI. Empty (default) composes from `image_registry` + `image_tag`."
|
||||
type = string
|
||||
default = ""
|
||||
}
|
||||
|
||||
variable "migrations_image" {
|
||||
description = <<-EOT
|
||||
Full image URI for the one-off prisma migration Cloud Run Job. Empty
|
||||
(default) composes from `image_registry` + `image_tag` as
|
||||
`litellm-migrations`. Built from `migrations/Dockerfile` — slim image
|
||||
whose ENTRYPOINT runs `python3 /app/run.py` (assembles DATABASE_URL
|
||||
from DATABASE_* env vars via DatabaseURLSettings, then runs
|
||||
`prisma migrate deploy`). Should track the same release tag as
|
||||
gateway/backend/ui.
|
||||
EOT
|
||||
type = string
|
||||
default = ""
|
||||
}
|
||||
|
||||
# ---------- Service sizing ----------
|
||||
|
||||
variable "gateway_cpu" {
|
||||
description = "Cloud Run CPU per gateway instance."
|
||||
type = string
|
||||
default = "1000m"
|
||||
}
|
||||
|
||||
variable "gateway_memory" {
|
||||
description = "Cloud Run memory per gateway instance."
|
||||
type = string
|
||||
default = "4Gi"
|
||||
}
|
||||
|
||||
# Cloud Run autoscales out of the box (request-rate driven). The min/max
|
||||
# bounds mirror the HPA replica bounds in helm/litellm/values.yaml so each
|
||||
# stack scales over the same range. Cloud Run has no direct CPU-utilization
|
||||
# target; the request-concurrency knob below is the closest analog.
|
||||
|
||||
variable "gateway_min_instances" {
|
||||
description = "Lower bound on gateway Cloud Run instances. Matches helm HPA minReplicas."
|
||||
type = number
|
||||
default = 1
|
||||
}
|
||||
|
||||
variable "gateway_max_instances" {
|
||||
description = "Upper bound on gateway Cloud Run instances. Matches helm HPA maxReplicas."
|
||||
type = number
|
||||
default = 10
|
||||
}
|
||||
|
||||
variable "gateway_max_instance_request_concurrency" {
|
||||
description = "Concurrent requests one gateway instance handles before Cloud Run scales out. Cloud Run v2 default is 80; lower it for LLM streams that pin a worker for tens of seconds."
|
||||
type = number
|
||||
default = 80
|
||||
}
|
||||
|
||||
variable "backend_cpu" {
|
||||
description = "Cloud Run CPU per backend instance. Cloud Run rejects sub-1 CPU when `backend_max_instance_request_concurrency > 1`, so the default is 1000m. Lower this only if you also drop concurrency to 1."
|
||||
type = string
|
||||
default = "1000m"
|
||||
}
|
||||
|
||||
variable "backend_memory" {
|
||||
description = "Cloud Run memory per backend instance."
|
||||
type = string
|
||||
default = "4Gi"
|
||||
}
|
||||
|
||||
variable "backend_min_instances" {
|
||||
description = "Lower bound on backend Cloud Run instances. Matches helm HPA minReplicas."
|
||||
type = number
|
||||
default = 1
|
||||
}
|
||||
|
||||
variable "backend_max_instances" {
|
||||
description = "Upper bound on backend Cloud Run instances. Matches helm HPA maxReplicas."
|
||||
type = number
|
||||
default = 4
|
||||
}
|
||||
|
||||
variable "backend_max_instance_request_concurrency" {
|
||||
description = "Concurrent requests one backend instance handles before Cloud Run scales out."
|
||||
type = number
|
||||
default = 80
|
||||
}
|
||||
|
||||
variable "ui_cpu" {
|
||||
description = "Cloud Run CPU per UI instance. Cloud Run rejects sub-1 CPU when `ui_max_instance_request_concurrency > 1`, so the default is 1000m. Lower this only if you also drop concurrency to 1 (which makes nginx scale 1:1 with traffic — almost never what you want)."
|
||||
type = string
|
||||
default = "1000m"
|
||||
}
|
||||
|
||||
variable "ui_memory" {
|
||||
description = "Cloud Run memory per UI instance. Cloud Run rejects `< 512Mi` when CPU is always-allocated (the default whenever `ui_min_instances > 0`), so the default is 512Mi."
|
||||
type = string
|
||||
default = "512Mi"
|
||||
}
|
||||
|
||||
variable "ui_min_instances" {
|
||||
description = "Lower bound on UI Cloud Run instances. Matches helm HPA minReplicas."
|
||||
type = number
|
||||
default = 1
|
||||
}
|
||||
|
||||
variable "ui_max_instances" {
|
||||
description = "Upper bound on UI Cloud Run instances. Matches helm HPA maxReplicas."
|
||||
type = number
|
||||
default = 3
|
||||
}
|
||||
|
||||
variable "ui_max_instance_request_concurrency" {
|
||||
description = "Concurrent requests one UI instance handles before Cloud Run scales out. The UI is static nginx, so this can be high."
|
||||
type = number
|
||||
default = 200
|
||||
}
|
||||
|
||||
# ---------- Cloud SQL ----------
|
||||
|
||||
variable "db_tier" {
|
||||
description = "Cloud SQL tier (machine type) for the writer instance."
|
||||
type = string
|
||||
default = "db-custom-2-7680"
|
||||
}
|
||||
|
||||
variable "db_edition" {
|
||||
description = "Cloud SQL edition. ENTERPRISE accepts the db-custom-* and db-n1-* tiers. ENTERPRISE_PLUS only accepts db-perf-optimized-* tiers and is ~3x cost — change db_tier in lockstep when switching."
|
||||
type = string
|
||||
default = "ENTERPRISE"
|
||||
|
||||
validation {
|
||||
condition = contains(["ENTERPRISE", "ENTERPRISE_PLUS"], var.db_edition)
|
||||
error_message = "db_edition must be ENTERPRISE or ENTERPRISE_PLUS."
|
||||
}
|
||||
}
|
||||
|
||||
variable "db_version" {
|
||||
description = "Cloud SQL Postgres version."
|
||||
type = string
|
||||
default = "POSTGRES_16"
|
||||
}
|
||||
|
||||
variable "db_name" {
|
||||
description = "Initial database created on the Cloud SQL instance."
|
||||
type = string
|
||||
default = "litellm"
|
||||
}
|
||||
|
||||
variable "db_username" {
|
||||
description = "Application Postgres user (password-auth). Password is auto-generated and stored in Secret Manager."
|
||||
type = string
|
||||
default = "litellm_app"
|
||||
}
|
||||
|
||||
variable "lb_domains" {
|
||||
description = <<-EOT
|
||||
DNS names for a Google-managed SSL certificate fronting the LB. When
|
||||
non-empty, the stack provisions a 443 forwarding rule + HTTPS target
|
||||
proxy + managed cert covering these domains, and the existing 80
|
||||
forwarding rule serves a permanent 301 redirect to HTTPS. Leave empty
|
||||
([]) to disable TLS (must combine with `allow_plaintext_lb = true` for
|
||||
the plan to succeed — see README.md "TLS"). Each domain must already
|
||||
resolve to the LB's anycast IP (`lb_ip` output) for managed-cert
|
||||
provisioning to succeed.
|
||||
EOT
|
||||
type = list(string)
|
||||
default = []
|
||||
}
|
||||
|
||||
variable "allow_plaintext_lb" {
|
||||
description = <<-EOT
|
||||
Opt into HTTP-only mode on the load balancer (port 80, no TLS).
|
||||
Default false: `terraform plan` fails when `lb_domains = []` so the
|
||||
operator must either provide DNS names for a managed cert or
|
||||
consciously opt out. Intended for short-lived trial / dev stacks only.
|
||||
EOT
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
variable "cloudsql_deletion_protection" {
|
||||
description = "Cloud SQL instance-level deletion protection (writer + reader). Default true — `terraform destroy` (and `terraform apply` operations that replace the instance) will fail with a clear error rather than silently dropping the database. Set false only for ephemeral / CI environments."
|
||||
type = bool
|
||||
default = true
|
||||
}
|
||||
|
||||
variable "gcs_force_destroy" {
|
||||
description = <<-EOT
|
||||
Allow `terraform destroy` to delete the GCS bucket even when it still
|
||||
contains objects (request log archives, /v1/files storage, GCS cache
|
||||
backend). Default false — destroying a non-empty bucket fails, acting
|
||||
as a tripwire against accidental data loss. Set true only for
|
||||
ephemeral / CI environments. Mirrors `s3_force_destroy` on AWS and
|
||||
`cloudsql_deletion_protection` on the database side.
|
||||
EOT
|
||||
type = bool
|
||||
default = false
|
||||
}
|
||||
|
||||
# ---------- Memorystore (Redis) ----------
|
||||
|
||||
variable "redis_tier" {
|
||||
description = "Memorystore tier — STANDARD_HA for production, BASIC for dev."
|
||||
type = string
|
||||
default = "STANDARD_HA"
|
||||
}
|
||||
|
||||
variable "redis_memory_size_gb" {
|
||||
type = number
|
||||
default = 1
|
||||
}
|
||||
|
||||
# ---------- Extras / proxy_config ----------
|
||||
|
||||
variable "gateway_extra_env" {
|
||||
description = "Plain-text env vars layered onto the gateway."
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "backend_extra_env" {
|
||||
description = "Plain-text env vars layered onto the backend."
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "gateway_extra_secrets" {
|
||||
description = <<-EOT
|
||||
Extra env vars sourced from Google Secret Manager, applied to the
|
||||
gateway. Map of env-var name to the Secret Manager **secret resource
|
||||
ID** (`projects/<project>/secrets/<name>` — *not* a version resource
|
||||
ID; the Cloud Run secret_key_ref binding and the stack's IAM grant
|
||||
both reject `/versions/<n>` suffixes). Versions are always resolved
|
||||
as `latest`; if you need a pinned version, edit
|
||||
`local.gateway_extra_secret_kv` in `cloudrun.tf` directly.
|
||||
|
||||
Example:
|
||||
gateway_extra_secrets = {
|
||||
OPENAI_API_KEY = "projects/my-proj/secrets/openai-api-key"
|
||||
}
|
||||
|
||||
The Cloud Run service account auto-gains roles/secretmanager.secretAccessor
|
||||
on each secret listed here.
|
||||
EOT
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "backend_extra_secrets" {
|
||||
description = "Same shape as gateway_extra_secrets (secret resource ID, version always `latest`), layered onto the backend."
|
||||
type = map(string)
|
||||
default = {}
|
||||
}
|
||||
|
||||
variable "proxy_config" {
|
||||
description = <<-EOT
|
||||
LiteLLM proxy config (contents of config.yaml). Mirrors the helm chart's
|
||||
`gateway.config.proxy_config`. Passed to gateway, backend, and the
|
||||
migration job as a base64-encoded env var and decoded to
|
||||
/tmp/litellm-config.yaml at container start; CONFIG_FILE_PATH is set
|
||||
automatically. Reference env-injected secrets from the YAML via
|
||||
`os.environ/<NAME>`. Leave empty ({}) to skip.
|
||||
EOT
|
||||
type = any
|
||||
default = {}
|
||||
}
|
||||
18
terraform/litellm/gcp/versions.tf
Normal file
18
terraform/litellm/gcp/versions.tf
Normal file
@ -0,0 +1,18 @@
|
||||
terraform {
|
||||
required_version = ">= 1.6.0"
|
||||
|
||||
required_providers {
|
||||
google = {
|
||||
source = "hashicorp/google"
|
||||
version = "~> 6.10"
|
||||
}
|
||||
google-beta = {
|
||||
source = "hashicorp/google-beta"
|
||||
version = "~> 6.10"
|
||||
}
|
||||
random = {
|
||||
source = "hashicorp/random"
|
||||
version = "~> 3.6"
|
||||
}
|
||||
}
|
||||
}
|
||||
Loading…
Reference in New Issue
Block a user