refactor: convert AWS and GCP Terraform stacks into reusable modules … (#28103)

* refactor: convert AWS and GCP Terraform stacks into reusable modules with examples/default entry point

- Remove `provider` blocks from both AWS and GCP stack roots so the modules
  can be consumed with `count`, `for_each`, `depends_on`, assumed-role or
  aliased providers — patterns that are forbidden when a module owns its own
  provider configuration
- Add `examples/default/` thin-root wrappers for both stacks that wire the
  provider (AWS) / providers (google + google-beta) and call the module with
  a curated variable surface, preserving the one-command deploy experience
- Move `terraform.tfvars.example` files into `examples/default/` alongside
  the new roots; update example comments to reflect the curated variable surface
- Thread `local.tags` (containing `litellm:stack`, `managed-by`, and
  `var.tags`) explicitly onto every taggable AWS resource since the module no
  longer controls the provider's `default_tags`; GCP resource labels already
  flow through the module's `labels` input
- Add `examples/default/variables.tf` and `outputs.tf` for both stacks,
  exposing the most-used knobs and re-exporting all module outputs
- Commit provider lock files for both examples so `terraform init` is
  reproducible without a network fetch
- Update top-level and per-stack READMEs to document the module-first design,
  the `for_each` multi-tenant pattern, and the `examples/default/` quick-start path

* docs(terraform): address review — state-migration guide, tag dedupe, for_each note

- Add 'Migrating an existing deployment' section to AWS & GCP READMEs
  documenting the required terraform state mv step (resource addresses now
  gain a module.litellm. prefix under the examples/default root)
- Remove redundant managed-by tag from the AWS example providers.tf;
  reserve default_tags there for org-wide tags only
- Document the for_each single-provider limitation for GCP (no
  configuration_aliases) in the README and example main.tf

Resolves LIT-3504

* docs(terraform/gcp): note expected SSL cert replacement in state-migration guide

The managed SSL cert is named with a hash of lb_domains, so TLS-enabled
stacks that migrated from the old un-hashed name will see one
create_before_destroy cert replacement after terraform state mv — not a
clean 'No changes'. Document that this single replacement is expected and
safe.

* docs(terraform): drop state-migration guides

The AWS/GCP stacks have never been published, so there are no existing
deployments to migrate from the old root-module layout. Remove the
'Migrating an existing deployment' sections from both READMEs.

* docs(terraform): call out image-registry override required for GCP 1-click

The GCP stack's default image_registry points at ghcr.io, which Cloud
Run won't authenticate against, so any real deploy (HCP Terraform
no-code or otherwise) must override it. Document that as a hard
requirement on the GCP README rather than a side note, and add a
top-level HCP Terraform 1-click section enumerating the required
inputs per stack and the migration-task caveat for HCP-hosted runners.

* feat(terraform/aws): mount proxy_config from S3 and wire OpenTelemetry v2

proxy_config

Drop the inline LITELLM_PROXY_CONFIG_B64 env var. Upload the YAML to S3
at config/litellm-config.yaml; gateway and backend container entrypoints
download it to /tmp/litellm-config.yaml via boto3 before exec'ing
uvicorn. The S3 object etag is wired into the task definition so a
config edit produces a new task-def revision and a rolling redeploy. The
existing s3_access policy already grants the task role s3:GetObject on
this bucket, so no IAM changes were needed for the mount itself.

OpenTelemetry v2

New variables otel_endpoint, otel_exporter, otel_service_name, and
otel_headers_secret_arn. Setting otel_endpoint to a non-empty value adds
LITELLM_OTEL_V2=true plus OTEL_EXPORTER / OTEL_ENDPOINT /
OTEL_SERVICE_NAME / OTEL_ENVIRONMENT_NAME to the shared env block; an
optional Secrets Manager ARN backs OTEL_HEADERS for collectors that need
an auth header. Execution role auto-gains GetSecretValue on that ARN.
Empty endpoint = nothing added, so existing deployments are unchanged.

* feat(terraform/gcp): add DeployStack one-click installer

Wires up a Cloud Shell "Open in Cloud Shell" badge backed by the
GoogleCloudPlatform DeployStack flow so examples/default can be
installed from a click in the README without a local terraform setup.

- examples/default/deploystack.json drives project/region collection
  plus prompts for tenant, env, image_tag, and allow_plaintext_lb.
  Complex inputs (proxy_config, *_extra_secrets, lb_domains) and
  sensitive vars (litellm_master_key, litellm_license, ui_password)
  stay tfvars / env only so they never land in a committed file.
- examples/default/TUTORIAL.md is a Cloud Shell walkthrough that
  enables required APIs, creates the GHCR-passthrough Artifact
  Registry repo, optionally exports the TF_VAR_* secrets, runs
  `deploystack install`, and shows how to fetch the master key plus
  migrate from plaintext LB to TLS.
- Renames var.project to var.project_id across the module and the
  examples/default wrapper to match the variable DeployStack injects
  from `collect_project: true`. Breaking rename for anyone with a
  `project = ...` line in terraform.tfvars; the fix is one line.

* feat(terraform/gcp): mount proxy_config from GCS and wire OpenTelemetry v2

proxy_config

Drop the inline LITELLM_PROXY_CONFIG_B64 env var and the python-decode
startup fragment. Upload the YAML to a dedicated GCS bucket as
config.yaml, then mount it read-only into the gateway and backend at
/etc/litellm via Cloud Run v2's gcsfuse volume. CONFIG_FILE_PATH points
at the mount; an md5 of the YAML rides along as PROXY_CONFIG_HASH so a
config-only edit forces a new Cloud Run revision (gcsfuse only surfaces
new objects on container restart, so without the hash an updated
proxy_config would sit in the bucket unread).

The config bucket is separate from the data-plane bucket so the runtime
SA can hold objectViewer here (read-only at runtime) while keeping
objectAdmin on the data-plane bucket. Both bucket and IAM binding are
gated on proxy_config != {}; an empty config skips bucket creation and
mounts nothing.

OpenTelemetry v2

LITELLM_OTEL_V2=true is now wired into shared_env_kv unconditionally so
both the gateway and backend boot with the integration enabled. It's
dormant until otel_endpoint is non-empty; setting it injects
OTEL_EXPORTER / OTEL_ENDPOINT / OTEL_ENVIRONMENT_NAME plus a
per-component OTEL_SERVICE_NAME (\${tenant}-litellm-\${env}-{gateway,backend})
so spans land tagged with the right hop. otel_headers_secret takes a
Secret Manager resource ID for OTEL_HEADERS (collector auth); the
runtime SA auto-gains roles/secretmanager.secretAccessor on it.
otel_capture_message_content defaults to no_content matching the litellm
default. Any OTEL_* key set in *_extra_env wins over the defaults so
Cloud Run doesn't reject the apply on the duplicate-env-name check.

* refactor(terraform): make AWS and GCP stacks behave identically

Bring both modules to the same surface and the same runtime behavior so
swapping clouds (or reading either README) is symmetric.

Labels and tags. GCP previously stamped var.labels onto only the two GCS
buckets, leaving Cloud Run, Cloud SQL, Memorystore, Secret Manager, and
the LB resources unlabeled; the variable description claimed full
coverage. Now the module computes local.labels (litellm-stack +
managed-by + var.labels, mirroring AWS's local.tags) and threads it onto
every label-supporting resource: Cloud Run services and the migrations
job, Cloud SQL writer and reader (via user_labels), Memorystore, Secret
Manager entries (master_key, license, ui_password, db_password), both
GCS buckets, the global LB address, and the http/https forwarding rules.
GCP keys use 'litellm-stack' instead of AWS's 'litellm:stack' because
GCP label keys forbid colons; var.labels now defaults to {}.

OpenTelemetry v2 is opt-in on both stacks. AWS already gated everything
on otel_endpoint; GCP previously stamped LITELLM_OTEL_V2=true into
shared_env unconditionally and only ungated the OTEL_* block. Both
stacks now do the same thing: leave otel_endpoint empty and nothing
OTel-related lands in the container env; set it and gateway and backend
get LITELLM_OTEL_V2=true plus OTEL_EXPORTER, OTEL_ENDPOINT,
OTEL_ENVIRONMENT_NAME, OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT,
and a per-component OTEL_SERVICE_NAME (${tenant}-litellm-${env}-gateway
or -backend) so spans land tagged with the right hop. AWS picks up the
richer GCP surface: otel_environment_name (defaults to var.env),
otel_capture_message_content (defaults to no_content), and *_extra_env
override filtering so a caller-set OTEL_* key wins over the default for
that service (ECS allows duplicates, but the filter gives the same
predictable last-wins shape Cloud Run enforces). var.otel_service_name
on AWS is gone, replaced by the per-component naming.

uvicorn workers. GCP gains gateway_num_workers, matching AWS; threads
into the gateway args as --workers ${var.gateway_num_workers}.

Docs reflect the parity: each README's OTel section, the GCP 'Using as
a module' Labels paragraph, and a new feature-parity table in the
top-level README that lays out the AWS/GCP input mapping side by side.

* fix(terraform/aws): expose skip_final_snapshot through the default example

The example wrapper already exposed `s3_force_destroy` so ephemeral / CI
stacks could destroy the S3 bucket without manual cleanup, but the matching
Aurora knob (`skip_final_snapshot`) was hidden behind the module surface.
That meant a `terraform destroy` on a trial stack still produced a
`<cluster>-final-<short-sha>` snapshot, with no opt-out short of editing the
module call.

Adds `var.skip_final_snapshot` to the example (default `false`, preserving
the data-loss tripwire) and threads it through to the module input,
mirroring the existing `s3_force_destroy` pattern. Documented alongside it
in the tfvars example.

Verified by deploying the example end-to-end against a clean AWS account
(VPC + Aurora w/ IAM auth + Redis + ALB + 3 ECS services), confirming all
services reach steady state and the data plane serves traffic, then running
`terraform destroy` with `skip_final_snapshot = true` to a clean teardown
(93 destroyed, no Aurora snapshot left behind, no leftover billable
resources).

---------

Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
Co-authored-by: yassin-berriai <yassin.kortam@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Yassin Kortam 2026-06-06 12:57:44 -07:00 committed by GitHub
parent fdade8a84e
commit 1cff02f50e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
44 changed files with 1711 additions and 137 deletions

View File

@ -1,18 +1,34 @@
# LiteLLM Terraform stacks
Two self-contained Terraform root modules that deploy the **componentized**
LiteLLM proxy — the gateway, backend, and UI as three independent containers
(see `helm/litellm/` for the canonical chart with the same split).
Two self-contained, reusable Terraform **modules** that deploy the
**componentized** LiteLLM proxy — the gateway, backend, and UI as three
independent containers (see `helm/litellm/` for the canonical chart with the
same split).
Each module declares **no `provider` block of its own**, so it can be called
with `count` / `for_each` / `depends_on` and the caller controls region,
assume-role / impersonation, aliases, and `default_tags`. A ready-to-run root
that wires the provider lives at `<stack>/examples/default/` — that's the
one-command deploy path. To embed a stack in your own config, call the module
by source:
```hcl
module "litellm" {
source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
# ... inputs ...
}
```
| Stack | Compute | Database (writer + reader) | Cache | Object store | Public entrypoint |
| ------ | ----------- | ---------------------------------- | ----------- | ------------ | ------------------ |
| `aws/` | ECS Fargate | Aurora Postgres (IAM auth) | ElastiCache | S3 | Application LB |
| `gcp/` | Cloud Run | Cloud SQL Postgres (password auth) | Memorystore | GCS | External HTTPS LB |
Each stack creates its own VPC and managed data stores — drop in a tfvars
file and run `terraform apply`. Both stacks support a typed `proxy_config`
input (mirrors `helm/litellm`'s `gateway.config.proxy_config`) and per-component
extra env vars / secret-manager refs.
Each stack creates its own VPC and managed data stores — from
`<stack>/examples/default/`, drop in a tfvars file and run `terraform apply`.
Both stacks support a typed `proxy_config` input (mirrors `helm/litellm`'s
`gateway.config.proxy_config`) and per-component extra env vars /
secret-manager refs.
## Components
@ -147,6 +163,39 @@ against the backend image:
Run the migration job once after the first `terraform apply` and before the
gateway/backend services start serving traffic.
## Feature parity between stacks
The two modules expose the same conceptual surface; concrete inputs differ
only where the underlying cloud forces it.
| Capability | AWS input(s) | GCP input(s) |
| -------------------------------- | ------------------------------------------------------- | --------------------------------------------------------- |
| Tenant + env naming | `tenant`, `env` | `tenant`, `env` |
| Pre-shared master key / license | `litellm_master_key`, `litellm_license` | `litellm_master_key`, `litellm_license` |
| UI admin password | `ui_password` | `ui_password` |
| Per-deployment tags / labels | `tags` (`map(string)`) | `labels` (`map(string)`) |
| TLS posture | `acm_certificate_arn`, `allow_plaintext_alb` | `lb_domains`, `allow_plaintext_lb` |
| Force destroy of object store | `s3_force_destroy` | `gcs_force_destroy` |
| Database deletion protection | `skip_final_snapshot` | `cloudsql_deletion_protection` |
| `proxy_config` (typed YAML map) | `proxy_config` | `proxy_config` |
| Extra plain env per component | `gateway_extra_env`, `backend_extra_env` | `gateway_extra_env`, `backend_extra_env` |
| Extra secret-backed env | `gateway_extra_secrets`, `backend_extra_secrets` (ARNs) | `gateway_extra_secrets`, `backend_extra_secrets` (resource IDs) |
| Uvicorn `--workers` on gateway | `gateway_num_workers` | `gateway_num_workers` |
| OpenTelemetry v2 (opt-in) | `otel_endpoint`, `otel_exporter`, `otel_environment_name`, `otel_capture_message_content`, `otel_headers_secret_arn` | `otel_endpoint`, `otel_exporter`, `otel_environment_name`, `otel_capture_message_content`, `otel_headers_secret` |
Each module stamps its own stack-identity tag (`litellm:stack` on AWS,
`litellm-stack` on GCP — GCP label keys forbid colons) plus
`managed-by = "terraform"` onto every taggable / labelable resource and
merges `var.tags` / `var.labels` on top. Provider `default_tags` on AWS
merge on top of all of these.
OTel is opt-in on both clouds: leave `otel_endpoint` empty and nothing
OTel-related is added to the container env; set it and both gateway and
backend get `LITELLM_OTEL_V2=true` plus the full `OTEL_*` block, with
`OTEL_SERVICE_NAME` stamped per component
(`<tenant>-litellm-<env>-gateway` and `-backend`). Any `OTEL_*` key set
in `gateway_extra_env` / `backend_extra_env` wins for that service.
## What's not included
- TLS certificates / custom domains. Both stacks expose plain-HTTP load
@ -156,4 +205,46 @@ gateway/backend services start serving traffic.
backend block to `versions.tf` when graduating to a team environment.
- Observability beyond the cloud provider's defaults (CloudWatch logs on
AWS, Cloud Logging on GCP). Wire your own Prometheus / Datadog / Langfuse
via the `*_extra_env` variables.
via the `*_extra_env` variables, or turn on OTel v2 (see the parity
table above).
## HCP Terraform no-code (1-click) deploy
Both stacks are publishable as no-code modules in HCP Terraform's private
registry. The end-user flow is: open the no-code launch URL, fill in a
few inputs, hit *Create workspace*, and HCP runs plan/apply against your
cloud account using a variable-set of credentials (static keys or
dynamic-credentials OIDC).
Required overrides the launcher must supply per stack:
- **AWS** (`terraform/litellm/aws`): `region`, `azs`, `tenant`, `env`.
The image vars (`gateway_image`, `backend_image`, `ui_image`,
`migrations_image`) can be left at their defaults — the GHCR images
are anonymous-readable and ECS Fargate pulls them without extra
credentials.
- **GCP** (`terraform/litellm/gcp`): `project`, `tenant`, `env`, **and
one of**:
- `image_registry` pointed at an Artifact Registry **remote** repository
backed by `https://ghcr.io` (e.g.
`us-central1-docker.pkg.dev/<project>/litellm/berriai`), so Cloud Run
pulls the four upstream `litellm-*` images through it; or
- all four per-component `*_image` URIs pointing at images mirrored
into a regular Artifact Registry repo.
The defaults (`ghcr.io/berriai`) cause Cloud Run admission to reject
the service spec — Cloud Run only authenticates against Artifact
Registry, `[region.]gcr.io`, or `docker.io`. See
`terraform/litellm/gcp/README.md#image-pulls` for the
`gcloud artifacts repositories create … --mode=remote-repository`
command that sets up the passthrough repo (one-time, per project).
What still requires a manual step regardless of HCP no-code:
- The one-off migration task. The stacks auto-run it via `local-exec`
during `terraform apply`, but that requires the `aws` / `gcloud` CLI
on the runner. HCP-hosted runners don't have them; use an HCP agent
pool with a custom image that includes the relevant CLI, or run the
command printed in the `migration_run_command` output by hand after
the first apply.

View File

@ -44,9 +44,12 @@ needs the `aws` CLI installed and authenticated.
### `proxy_config` (preferred)
Mirrors the helm chart's `gateway.config.proxy_config`. The map is YAML-encoded
and base64-passed to gateway, backend, and the migration task; each container
decodes it to `/tmp/litellm-config.yaml` at startup and sets `CONFIG_FILE_PATH`
to match.
and uploaded to S3 (`config/litellm-config.yaml` in the stack's bucket); the
gateway and backend container entrypoints download it to
`/tmp/litellm-config.yaml` at task start via boto3 and set `CONFIG_FILE_PATH`
to match. The S3 object's etag is wired into the task definition, so editing
`proxy_config` produces a new task-def revision and a rolling redeploy of both
services.
```hcl
proxy_config = {
@ -119,6 +122,42 @@ aws secretsmanager create-secret \
--secret-string "sk-proj-..."
```
### Observability (OpenTelemetry v2)
OTel v2 (https://docs.litellm.ai/docs/observability/opentelemetry_v2) is
opt-in and gated entirely on `otel_endpoint`. Empty (default) and nothing
OTel-related is added to the container env. Set it and both gateway and
backend gain `LITELLM_OTEL_V2=true` plus the `OTEL_*` block, with
`OTEL_SERVICE_NAME` stamped per component (`${tenant}-litellm-${env}-gateway`
and `-backend`) so spans land tagged with the right hop. Any `OTEL_*` key
set in `gateway_extra_env` / `backend_extra_env` overrides the default for
that service.
```hcl
otel_endpoint = "http://otel-collector.internal:4318"
otel_exporter = "otlp_http" # otlp_grpc, console
otel_environment_name = "prod" # defaults to var.env
```
For collectors that require an auth header, store the comma-separated
`key=value` string in Secrets Manager and reference it via
`otel_headers_secret_arn`. The execution role auto-gains
`secretsmanager:GetSecretValue` on that ARN.
```hcl
otel_headers_secret_arn = "arn:aws:secretsmanager:us-west-2:111122223333:secret:honeycomb-otel-headers-AbCdEf"
```
`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` defaults to
`no_content`; flip `otel_capture_message_content = "prompt_and_completion"`
only after auditing what lands in the backend, since prompts and
completions are typically sensitive.
Vendor presets (Arize, Phoenix, Langfuse OTel, Weave, Langtrace, Levo,
AgentOps) live under `proxy_config.litellm_settings.callbacks` and are
orthogonal to the OTLP variables above; their credentials still go in
`*_extra_secrets`.
## Tenant deployment
Every resource the stack creates is named `${tenant}-litellm-${env}` (or
@ -132,10 +171,11 @@ pair differs:
| `acme` | `prod` | `acme-litellm-prod-master-key` |
| `globex` | `dev` | `globex-litellm-dev-license` |
For a per-tenant instance, the only inputs that change are the tenant
slug, env, and the two pre-issued secrets:
For a per-tenant instance via the example root, the only inputs that
change are the tenant slug, env, and the two pre-issued secrets:
```bash
cd terraform/litellm/aws/examples/default
export TF_VAR_litellm_master_key="sk-..." # the tenant's master key
export TF_VAR_litellm_license="lic-..." # their LITELLM_LICENSE
@ -146,6 +186,22 @@ terraform apply \
-var "env=stage"
```
To run *many* tenants from a single config, call the module with
`for_each` instead of one root per tenant (see "Using as a module"):
```hcl
module "litellm" {
for_each = toset(["acme", "globex"])
source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
tenant = each.key
env = "prod"
region = "us-west-2"
azs = ["us-west-2a", "us-west-2b"]
}
```
(This `for_each` form is only possible because the module declares no
provider block — the original root-with-provider layout forbade it.)
Both `litellm_master_key` and `litellm_license` are optional:
- Omit `litellm_master_key` → the stack auto-generates a random `sk-…`
value (trial/dev path).
@ -159,14 +215,21 @@ example files.
## Quick start
```bash
cd terraform/litellm/aws
cd terraform/litellm/aws/examples/default
cp terraform.tfvars.example terraform.tfvars
# Edit: region, tenant, env, azs, *_image, proxy_config, gateway_extra_secrets.
# Edit: region, tenant, env, azs, proxy_config, gateway_extra_secrets.
terraform init
terraform apply
```
`examples/default/` is a thin root that configures the `aws` provider and
calls the module (`../../`). It exposes a curated variable surface; for
advanced knobs (per-component CPU/memory/workers, autoscaling, RDS/Redis
sizing, per-component image pins) set them on the `module "litellm"` block
in `examples/default/main.tf`, or call the module from your own config —
see "Using as a module" below.
That single apply provisions everything, runs the DB user bootstrap, runs the
schema migration, and only then starts the gateway/backend services. When it
returns, the stack is serving traffic.
@ -179,6 +242,34 @@ aws secretsmanager get-secret-value \
--query SecretString --output text
```
## Using as a module
The directory itself is a module with **no `provider` block** — the caller
owns provider config. That means you can call it directly with `for_each`
(many tenants from one config), `count` (conditional stacks), `depends_on`,
an assume-role / aliased provider, etc.:
```hcl
provider "aws" {
region = "us-west-2"
assume_role { role_arn = "arn:aws:iam::111122223333:role/deployer" }
}
module "litellm" {
source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
region = "us-west-2"
tenant = "acme"
env = "prod"
azs = ["us-west-2a", "us-west-2b"]
# ...any of the inputs in variables.tf...
}
```
Tags: the module threads its own `litellm:stack` / `managed-by` / `var.tags`
onto every taggable resource. Any `default_tags` on your provider merge on
top — set org-wide tags there, per-deployment tags via the `tags` input.
## Image pulls
The defaults pull from `ghcr.io/berriai/litellm-<component>:v1.86.0-dev`,
@ -238,8 +329,8 @@ losing the contents.
| File | What's in it |
| ----------------- | --------------------------------------------------------------------- |
| `versions.tf` | Terraform + provider version constraints |
| `providers.tf` | AWS provider (region + default tags) |
| `versions.tf` | Terraform + `required_providers` constraints (module declares no provider config) |
| `examples/default/` | Thin root: `aws` provider (with an optional `default_tags` slot for org-wide tags) + a call to the module. The one-command deploy path. |
| `variables.tf` | All input variables |
| `locals.tf` | Path-prefix lists for ALB routing (mirror of `helm/.../ingress.yaml`) |
| `network.tf` | VPC, subnets, IGW, NAT, route tables, security groups |

View File

@ -6,6 +6,8 @@ resource "aws_lb" "this" {
subnets = aws_subnet.public[*].id
idle_timeout = 120
tags = local.tags
}
locals {
@ -35,6 +37,8 @@ resource "aws_lb_target_group" "gateway" {
}
deregistration_delay = 30
tags = local.tags
}
resource "aws_lb_target_group" "backend" {
@ -54,6 +58,8 @@ resource "aws_lb_target_group" "backend" {
}
deregistration_delay = 30
tags = local.tags
}
resource "aws_lb_target_group" "ui" {
@ -73,6 +79,8 @@ resource "aws_lb_target_group" "ui" {
}
deregistration_delay = 30
tags = local.tags
}
# HTTP listener. When TLS is enabled this only serves a permanent
@ -106,6 +114,8 @@ resource "aws_lb_listener" "http" {
error_message = "ALB has no HTTPS listener. Either set `acm_certificate_arn` to enable TLS, or set `allow_plaintext_alb = true` to opt into HTTP-only (trial / dev only)."
}
}
tags = local.tags
}
# HTTPS listener. Only created when an ACM cert ARN is supplied terminates
@ -122,6 +132,8 @@ resource "aws_lb_listener" "https" {
type = "forward"
target_group_arn = aws_lb_target_group.backend.arn
}
tags = local.tags
}
# UI exact paths (/, /favicon.ico, /ui) priority 10.
@ -139,6 +151,8 @@ resource "aws_lb_listener_rule" "ui_exact" {
values = local.ui_exact_paths
}
}
tags = local.tags
}
# UI prefix paths (/_next/*, /litellm-asset-prefix/*, /assets/*, /ui/*) priority 20.
@ -156,6 +170,8 @@ resource "aws_lb_listener_rule" "ui_prefix" {
values = local.ui_path_prefixes
}
}
tags = local.tags
}
# Gateway prefix rules one per chunk-of-5 because ALB caps a path-pattern
@ -176,4 +192,6 @@ resource "aws_lb_listener_rule" "gateway" {
values = each.value
}
}
tags = local.tags
}

View File

@ -32,6 +32,8 @@ resource "aws_iam_policy" "bootstrap_secrets" {
Resource = [aws_secretsmanager_secret.db_master_password.arn]
}]
})
tags = local.tags
}
resource "aws_iam_role_policy_attachment" "task_execution_bootstrap_secrets" {
@ -43,6 +45,8 @@ resource "aws_iam_role_policy_attachment" "task_execution_bootstrap_secrets" {
resource "aws_cloudwatch_log_group" "bootstrap_db" {
name = "/ecs/${local.name}/bootstrap-db"
retention_in_days = var.log_retention_days
tags = local.tags
}
locals {
@ -101,6 +105,8 @@ resource "aws_ecs_task_definition" "bootstrap_db" {
}
}
}])
tags = local.tags
}
# ---------- Bootstrap trigger ----------

View File

@ -5,26 +5,36 @@ resource "aws_ecs_cluster" "this" {
name = "containerInsights"
value = "enabled"
}
tags = local.tags
}
resource "aws_cloudwatch_log_group" "gateway" {
name = "/ecs/${local.name}/gateway"
retention_in_days = var.log_retention_days
tags = local.tags
}
resource "aws_cloudwatch_log_group" "backend" {
name = "/ecs/${local.name}/backend"
retention_in_days = var.log_retention_days
tags = local.tags
}
resource "aws_cloudwatch_log_group" "ui" {
name = "/ecs/${local.name}/ui"
retention_in_days = var.log_retention_days
tags = local.tags
}
resource "aws_cloudwatch_log_group" "migrations" {
name = "/ecs/${local.name}/migrations"
retention_in_days = var.log_retention_days
tags = local.tags
}
# Shared env block fed to gateway, backend, and the migration task. Mirrors
@ -34,6 +44,38 @@ resource "aws_cloudwatch_log_group" "migrations" {
# HOST/PORT/USER/NAME plus an IAM-signed token, so no DB password is needed
# in the task definition.
locals {
# OTel v2 is opt-in and gated on otel_endpoint, matching the GCP stack.
# When set, LITELLM_OTEL_V2 flips on alongside the OTEL_* block, with
# OTEL_SERVICE_NAME stamped per component so spans land tagged with the
# right hop. Any OTEL_* key set in *_extra_env wins over the default for
# that service (ECS allows duplicates but last-wins is undefined, so we
# filter here for the same predictable behavior GCP gets from Cloud Run's
# hard duplicate-rejection).
otel_enabled = var.otel_endpoint != ""
otel_environment_name = var.otel_environment_name != "" ? var.otel_environment_name : var.env
otel_shared_env = local.otel_enabled ? [
{ name = "LITELLM_OTEL_V2", value = "true" },
{ name = "OTEL_EXPORTER", value = var.otel_exporter },
{ name = "OTEL_ENDPOINT", value = var.otel_endpoint },
{ name = "OTEL_ENVIRONMENT_NAME", value = local.otel_environment_name },
{ name = "OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT", value = var.otel_capture_message_content },
] : []
gateway_otel_env_raw = concat(local.otel_shared_env, local.otel_enabled ? [
{ name = "OTEL_SERVICE_NAME", value = "${local.name}-gateway" },
] : [])
backend_otel_env_raw = concat(local.otel_shared_env, local.otel_enabled ? [
{ name = "OTEL_SERVICE_NAME", value = "${local.name}-backend" },
] : [])
gateway_otel_env = [
for e in local.gateway_otel_env_raw : e if !contains(keys(var.gateway_extra_env), e.name)
]
backend_otel_env = [
for e in local.backend_otel_env_raw : e if !contains(keys(var.backend_extra_env), e.name)
]
otel_secrets = local.otel_enabled && var.otel_headers_secret_arn != "" ? [
{ name = "OTEL_HEADERS", valueFrom = var.otel_headers_secret_arn },
] : []
shared_env = [
{ name = "IAM_TOKEN_DB_AUTH", value = "true" },
{ name = "DATABASE_HOST", value = aws_rds_cluster.this.endpoint },
@ -65,6 +107,7 @@ locals {
var.litellm_license == "" ? [] : [
{ name = "LITELLM_LICENSE", valueFrom = aws_secretsmanager_secret.license[0].arn },
],
local.otel_secrets,
)
# Backend-only managed secrets. UI_PASSWORD is consumed by the management
@ -91,20 +134,26 @@ locals {
]
# Mirrors the helm chart's gateway.config.create / configmap pattern.
# ECS Fargate has no ConfigMap analogue, so we pass the YAML as a
# base64-encoded env var and decode it at container start via a tiny
# python shim that prepends the image's normal uvicorn entrypoint.
# ECS Fargate has no ConfigMap analogue, so the YAML is uploaded to S3
# (see aws_s3_object.proxy_config in s3.tf) and the container entrypoint
# downloads it to /tmp/litellm-config.yaml via boto3 before exec'ing
# uvicorn. The S3 object's etag is embedded in the task definition so a
# config edit forces a new task-def revision and a rolling redeploy.
proxy_config_enabled = length(keys(var.proxy_config)) > 0
proxy_config_b64 = local.proxy_config_enabled ? base64encode(yamlencode(var.proxy_config)) : ""
proxy_config_path = "/tmp/litellm-config.yaml"
proxy_config_env = local.proxy_config_enabled ? [
{ name = "LITELLM_PROXY_CONFIG_B64", value = local.proxy_config_b64 },
{ name = "CONFIG_FILE_PATH", value = "/tmp/litellm-config.yaml" },
{ name = "CONFIG_FILE_PATH", value = local.proxy_config_path },
{ name = "LITELLM_PROXY_CONFIG_S3_BUCKET", value = aws_s3_bucket.this.bucket },
{ name = "LITELLM_PROXY_CONFIG_S3_KEY", value = aws_s3_object.proxy_config[0].key },
{ name = "LITELLM_PROXY_CONFIG_S3_ETAG", value = aws_s3_object.proxy_config[0].etag },
] : []
proxy_config_fetch_cmd = "python -c \"import os, boto3; boto3.client('s3', region_name=os.environ['AWS_REGION']).download_file(os.environ['LITELLM_PROXY_CONFIG_S3_BUCKET'], os.environ['LITELLM_PROXY_CONFIG_S3_KEY'], os.environ['CONFIG_FILE_PATH'])\""
# Gateway always needs --workers wired in (no NUM_WORKERS env var support
# in the image entrypoint). When proxy_config is enabled we also have to
# decode the base64 config first, so the command goes through `sh -c`;
# pull the config from S3 first, so the command goes through `sh -c`;
# otherwise we keep the image's ENTRYPOINT and only override `command`.
gateway_uvicorn_args = "--host 0.0.0.0 --port 4000 --workers ${var.gateway_num_workers}"
backend_uvicorn_args = "--host 0.0.0.0 --port 4001"
@ -112,7 +161,7 @@ locals {
gateway_proxy_overrides = local.proxy_config_enabled ? {
entryPoint = ["sh", "-c"]
command = [
"python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\" && exec uvicorn gateway.main:app ${local.gateway_uvicorn_args}"
"${local.proxy_config_fetch_cmd} && exec uvicorn gateway.main:app ${local.gateway_uvicorn_args}"
]
} : {
# Mirror the image's ENTRYPOINT so we can append --workers via command.
@ -123,7 +172,7 @@ locals {
backend_proxy_overrides = local.proxy_config_enabled ? {
entryPoint = ["sh", "-c"]
command = [
"python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\" && exec uvicorn backend.main:app ${local.backend_uvicorn_args}"
"${local.proxy_config_fetch_cmd} && exec uvicorn backend.main:app ${local.backend_uvicorn_args}"
]
} : {}
}
@ -148,6 +197,7 @@ resource "aws_ecs_task_definition" "gateway" {
portMappings = [{ containerPort = 4000, protocol = "tcp" }]
environment = concat(
local.shared_env,
local.gateway_otel_env,
local.gateway_extra_env_list,
local.proxy_config_env,
)
@ -169,6 +219,8 @@ resource "aws_ecs_task_definition" "gateway" {
local.gateway_proxy_overrides,
)
])
tags = local.tags
}
resource "aws_ecs_service" "gateway" {
@ -206,6 +258,8 @@ resource "aws_ecs_service" "gateway" {
aws_lb_listener.https,
terraform_data.migration,
]
tags = local.tags
}
# ---------- Backend ----------
@ -229,6 +283,7 @@ resource "aws_ecs_task_definition" "backend" {
environment = concat(
local.shared_env,
local.backend_default_env,
local.backend_otel_env,
local.backend_extra_env_list,
local.proxy_config_env,
)
@ -246,6 +301,8 @@ resource "aws_ecs_task_definition" "backend" {
local.backend_proxy_overrides,
)
])
tags = local.tags
}
resource "aws_ecs_service" "backend" {
@ -279,6 +336,8 @@ resource "aws_ecs_service" "backend" {
aws_lb_listener.https,
terraform_data.migration,
]
tags = local.tags
}
# ---------- UI ----------
@ -312,6 +371,8 @@ resource "aws_ecs_task_definition" "ui" {
}
}
])
tags = local.tags
}
resource "aws_ecs_service" "ui" {
@ -344,4 +405,6 @@ resource "aws_ecs_service" "ui" {
aws_lb_listener.http,
aws_lb_listener.https,
]
tags = local.tags
}

View File

@ -0,0 +1,46 @@
# This file is maintained automatically by "terraform init".
# Manual edits may be lost in future updates.
provider "registry.terraform.io/hashicorp/aws" {
version = "5.100.0"
constraints = "~> 5.60"
hashes = [
"h1:Ijt7pOlB7Tr7maGQIqtsLFbl7pSMIj06TVdkoSBcYOw=",
"zh:054b8dd49f0549c9a7cc27d159e45327b7b65cf404da5e5a20da154b90b8a644",
"zh:0b97bf8d5e03d15d83cc40b0530a1f84b459354939ba6f135a0086c20ebbe6b2",
"zh:1589a2266af699cbd5d80737a0fe02e54ec9cf2ca54e7e00ac51c7359056f274",
"zh:6330766f1d85f01ae6ea90d1b214b8b74cc8c1badc4696b165b36ddd4cc15f7b",
"zh:7c8c2e30d8e55291b86fcb64bdf6c25489d538688545eb48fd74ad622e5d3862",
"zh:99b1003bd9bd32ee323544da897148f46a527f622dc3971af63ea3e251596342",
"zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425",
"zh:9f8b909d3ec50ade83c8062290378b1ec553edef6a447c56dadc01a99f4eaa93",
"zh:aaef921ff9aabaf8b1869a86d692ebd24fbd4e12c21205034bb679b9caf883a2",
"zh:ac882313207aba00dd5a76dbd572a0ddc818bb9cbf5c9d61b28fe30efaec951e",
"zh:bb64e8aff37becab373a1a0cc1080990785304141af42ed6aa3dd4913b000421",
"zh:dfe495f6621df5540d9c92ad40b8067376350b005c637ea6efac5dc15028add4",
"zh:f0ddf0eaf052766cfe09dea8200a946519f653c384ab4336e2a4a64fdd6310e9",
"zh:f1b7e684f4c7ae1eed272b6de7d2049bb87a0275cb04dbb7cda6636f600699c9",
"zh:ff461571e3f233699bf690db319dfe46aec75e58726636a0d97dd9ac6e32fb70",
]
}
provider "registry.terraform.io/hashicorp/random" {
version = "3.9.0"
constraints = "~> 3.6"
hashes = [
"h1:OO+IuvQJSPmWdN8AyyIEvPJbLvDQpgX/zbktoa9KsJE=",
"zh:161ad0bd9a75768c82f53fb6e7172a9d8be2d4889b012645a34795031aaf1bf1",
"zh:19dc9a5b17729725ccfc4f45b0500af0ee5bc6b6b160c7adb8f2bf617d2c80ea",
"zh:269eda8fe42daa7974d5a34d166c3ba9defe80cde86c01e4dadcfdf2e1f05e5f",
"zh:373f7c65566f8f2cc7f45d698654feb9d988996957e1266a69ca00c52d6d16d0",
"zh:5599d16804c41c83009ec621b6d6b6f74e102f5827678a4750f8809055546b61",
"zh:583be0440469a22bff70dcfa56593b01566860b29607437264adb51060cf46fc",
"zh:5f211d8ec3f2e1f414870d9584bfe26e6995560ef81c748f8447a48164767398",
"zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
"zh:7b547fd16216761ef86efc3ed516ac5ac0c5c42b7c7eb24a08cef2d93f69ed5e",
"zh:7e7c0679daf2a382151d05068c8c3f0dae6b7b7dccf818827b73dd08638df2ef",
"zh:8089dec888a8038b9b4fb23b3df7e1057293dbc5b60b42cc47ff690d69d4b61b",
"zh:c51f15a031edfd6f23ce8ced3446ca7f8d8d647e2499890d7d5d10d5016d7257",
"zh:c94784f005708890dc6895afd53636ec00ec1e430b15d41e5aebfb1d4b39bd04",
]
}

View File

@ -0,0 +1,41 @@
# One-command deploy of the LiteLLM AWS stack.
#
# cd terraform/litellm/aws/examples/default
# cp terraform.tfvars.example terraform.tfvars # edit it
# terraform init
# terraform apply
#
# This root just wires the provider (see providers.tf) to the module. The
# module itself (../../) declares no provider, so it can also be consumed
# from your own config with count/for_each/aliased or assume-role providers:
#
# module "litellm" {
# source = "github.com/BerriAI/litellm//terraform/litellm/aws?ref=<tag>"
# ...
# }
#
# Knobs not surfaced as variables here (per-component sizing, autoscaling,
# RDS/Redis tuning) can be set directly on this block see ../../variables.tf.
module "litellm" {
source = "../../"
region = var.region
tenant = var.tenant
env = var.env
azs = var.azs
litellm_master_key = var.litellm_master_key
litellm_license = var.litellm_license
ui_password = var.ui_password
acm_certificate_arn = var.acm_certificate_arn
allow_plaintext_alb = var.allow_plaintext_alb
s3_force_destroy = var.s3_force_destroy
skip_final_snapshot = var.skip_final_snapshot
proxy_config = var.proxy_config
gateway_extra_env = var.gateway_extra_env
backend_extra_env = var.backend_extra_env
gateway_extra_secrets = var.gateway_extra_secrets
backend_extra_secrets = var.backend_extra_secrets
}

View File

@ -0,0 +1,54 @@
output "alb_dns_name" {
description = "Public DNS name of the LiteLLM ALB."
value = module.litellm.alb_dns_name
}
output "alb_url" {
description = "Proxy URL. Dashboard at /, API at /v1/*."
value = module.litellm.alb_url
}
output "ecs_cluster" {
description = "ECS cluster name."
value = module.litellm.ecs_cluster
}
output "aurora_writer_endpoint" {
description = "Aurora writer endpoint."
value = module.litellm.aurora_writer_endpoint
}
output "aurora_reader_endpoint" {
description = "Aurora reader endpoint."
value = module.litellm.aurora_reader_endpoint
}
output "redis_endpoint" {
description = "ElastiCache Redis primary endpoint (TLS)."
value = module.litellm.redis_endpoint
}
output "s3_bucket" {
description = "S3 bucket name."
value = module.litellm.s3_bucket
}
output "master_key_secret_arn" {
description = "Secrets Manager ARN holding LITELLM_MASTER_KEY."
value = module.litellm.master_key_secret_arn
}
output "db_master_password_secret_arn" {
description = "Secrets Manager ARN holding the Aurora master credentials (bootstrap-only)."
value = module.litellm.db_master_password_secret_arn
}
output "db_bootstrap_sql" {
description = "Run once as the master DB user to create the IAM-authed app user."
value = module.litellm.db_bootstrap_sql
}
output "migration_run_command" {
description = "Break-glass command to re-run the one-off prisma migration task."
value = module.litellm.migration_run_command
}

View File

@ -0,0 +1,24 @@
# The provider is configured HERE, in the root, not in the module. That is
# the whole point of the split: a module that declares its own configured
# `provider` block can't be called with count/for_each/depends_on and gives
# the caller no way to set assume-role, custom endpoints, or aliases.
#
# `default_tags` set here still flow into every resource the module creates
# (provider default_tags propagate through module calls) and merge with the
# module's own `litellm:stack` / `managed-by` / var.tags. Use this block for
# org-wide tags; use the module's `tags` input for per-deployment tags.
provider "aws" {
region = var.region
# Reserve `default_tags` for pure org-wide tags the module shouldn't know
# about (cost center, team, compliance scope, ). They propagate through the
# module call and merge with the module's own `litellm:stack` / `managed-by`
# / var.tags. The module already stamps `managed-by = "terraform"`, so don't
# duplicate it here set per-deployment tags via the module's `tags` input.
#
# default_tags {
# tags = {
# "cost-center" = "platform"
# }
# }
}

View File

@ -23,22 +23,17 @@ env = "stage"
# allow_plaintext_alb = true
# Storage retention: false (default) makes `terraform destroy` refuse on a
# non-empty bucket. Flip to true only for ephemeral / CI stacks.
# s3_force_destroy = false
# non-empty bucket / take an Aurora final snapshot. Flip to true only for
# ephemeral / CI stacks where you accept losing the data.
# s3_force_destroy = false
# skip_final_snapshot = false
# Component images. Defaults pin all four to the same GHCR release tag —
# bump them together when bumping LiteLLM. Override here to pull from a
# private registry or to mix-and-match versions.
# gateway_image = "ghcr.io/berriai/litellm-gateway:1.86.0-dev"
# backend_image = "ghcr.io/berriai/litellm-backend:1.86.0-dev"
# ui_image = "ghcr.io/berriai/litellm-ui:1.86.0-dev"
# migrations_image = "ghcr.io/berriai/litellm-migrations:1.86.0-dev"
# Per-task sizing for the gateway. Defaults are 1 vCPU / 4 GiB / 1 worker.
# uvicorn rule of thumb for CPU-bound work is (2 * vCPU) + 1 workers.
# gateway_cpu = 1024 # 1024 = 1 vCPU
# gateway_memory = 4096 # MiB
# gateway_num_workers = 1
# Component images and per-task sizing/autoscaling are NOT exposed as
# variables in this example (it keeps the curated surface small). They
# default to working public GHCR images. To pin images or tune
# CPU/memory/workers/autoscaling, set those inputs directly on the
# `module "litellm"` block in main.tf — the full list is in
# ../../variables.tf — or call the module from your own root config.
# ---------- proxy_config (mirrors helm gateway.config.proxy_config) ----------
# proxy_config = {
@ -86,3 +81,13 @@ env = "stage"
# OPENAI_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:openai-api-key-AbCdEf"
# ANTHROPIC_API_KEY = "arn:aws:secretsmanager:us-west-2:111122223333:secret:anthropic-api-key-GhIjKl"
# }
# ---------- OpenTelemetry v2 ----------
# OTel is gated on otel_endpoint: empty (default) and nothing is added to
# the container env; set it and both gateway and backend gain
# LITELLM_OTEL_V2=true plus the OTEL_* block (with OTEL_SERVICE_NAME
# stamped per component). The knobs aren't surfaced as wrapper vars in
# this example; set them directly on the `module "litellm"` block in
# main.tf (otel_endpoint, otel_exporter, otel_environment_name,
# otel_capture_message_content, otel_headers_secret_arn). Full docs in
# ../../variables.tf.

View File

@ -0,0 +1,104 @@
# Curated surface for the one-command deploy path. The module (../../)
# exposes far more knobs (per-component CPU/memory, autoscaling, RDS/Redis
# sizing, ). To tune those, set them directly on the `module "litellm"`
# block in main.tf, or call the module from your own root config. Full
# per-variable docs live in ../../variables.tf the module is the source
# of truth; descriptions here are intentionally terse.
variable "region" {
description = "AWS region to deploy into."
type = string
}
variable "tenant" {
description = "Tenant slug — prefix for every resource (<tenant>-litellm-<env>)."
type = string
}
variable "env" {
description = "Environment suffix (stage, prod, dev)."
type = string
}
variable "azs" {
description = "Availability zones for subnets. At least 2 (RDS + ALB)."
type = list(string)
}
# Sensitive prefer TF_VAR_litellm_master_key / TF_VAR_litellm_license /
# TF_VAR_ui_password so values stay out of any committed tfvars file.
variable "litellm_master_key" {
description = "Pre-existing LITELLM_MASTER_KEY (sk-…). Empty → auto-generated."
type = string
default = ""
sensitive = true
}
variable "litellm_license" {
description = "LiteLLM enterprise license. Empty → OSS-only."
type = string
default = ""
sensitive = true
}
variable "ui_password" {
description = "UI admin password. Empty → falls back to LITELLM_MASTER_KEY."
type = string
default = ""
sensitive = true
}
# TLS provide an ACM cert for production, or opt into HTTP-only for dev.
variable "acm_certificate_arn" {
description = "ACM cert ARN for the ALB HTTPS listener. Empty → no TLS."
type = string
default = ""
}
variable "allow_plaintext_alb" {
description = "Opt into HTTP-only ALB (trial/dev only)."
type = bool
default = false
}
variable "s3_force_destroy" {
description = "Allow destroy of a non-empty S3 bucket (ephemeral/CI only)."
type = bool
default = false
}
variable "skip_final_snapshot" {
description = "Skip the Aurora final snapshot on destroy (ephemeral/CI only)."
type = bool
default = false
}
variable "proxy_config" {
description = "LiteLLM proxy config (contents of config.yaml). Empty → defaults."
type = any
default = {}
}
variable "gateway_extra_env" {
description = "Plain-text env vars layered onto the gateway."
type = map(string)
default = {}
}
variable "backend_extra_env" {
description = "Plain-text env vars layered onto the backend."
type = map(string)
default = {}
}
variable "gateway_extra_secrets" {
description = "Gateway env vars sourced from Secrets Manager (name → ARN)."
type = map(string)
default = {}
}
variable "backend_extra_secrets" {
description = "Backend env vars sourced from Secrets Manager (name → ARN)."
type = map(string)
default = {}
}

View File

@ -0,0 +1,14 @@
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.60"
}
random = {
source = "hashicorp/random"
version = "~> 3.6"
}
}
}

View File

@ -13,6 +13,8 @@ data "aws_iam_policy_document" "task_assume" {
resource "aws_iam_role" "task_execution" {
name = "${local.name}-task-execution"
assume_role_policy = data.aws_iam_policy_document.task_assume.json
tags = local.tags
}
resource "aws_iam_role_policy_attachment" "task_execution" {
@ -52,6 +54,7 @@ data "aws_iam_policy_document" "secrets_access" {
aws_secretsmanager_secret.license[*].arn,
aws_secretsmanager_secret.ui_password[*].arn,
local.extra_secret_arns,
var.otel_headers_secret_arn == "" ? [] : [var.otel_headers_secret_arn],
)
}
}
@ -59,6 +62,8 @@ data "aws_iam_policy_document" "secrets_access" {
resource "aws_iam_policy" "secrets_access" {
name = "${local.name}-secrets-access"
policy = data.aws_iam_policy_document.secrets_access.json
tags = local.tags
}
resource "aws_iam_role_policy_attachment" "task_execution_secrets" {
@ -75,6 +80,8 @@ resource "aws_iam_role_policy_attachment" "task_execution_secrets" {
resource "aws_iam_role" "task" {
name = "${local.name}-task"
assume_role_policy = data.aws_iam_policy_document.task_assume.json
tags = local.tags
}
data "aws_caller_identity" "current" {}
@ -91,6 +98,8 @@ data "aws_iam_policy_document" "rds_iam_connect" {
resource "aws_iam_policy" "rds_iam_connect" {
name = "${local.name}-rds-iam-connect"
policy = data.aws_iam_policy_document.rds_iam_connect.json
tags = local.tags
}
resource "aws_iam_role_policy_attachment" "task_rds_iam_connect" {
@ -111,4 +120,6 @@ resource "aws_iam_role_policy_attachment" "task_rds_iam_connect" {
resource "aws_iam_role" "ui_task" {
name = "${local.name}-ui-task"
assume_role_policy = data.aws_iam_policy_document.task_assume.json
tags = local.tags
}

View File

@ -11,6 +11,20 @@ locals {
# the stack can reference local.name.
name = "${var.tenant}-litellm-${var.env}"
# This is a reusable module it declares no `provider` block, so the AWS
# provider's `default_tags` is the caller's concern, not ours. To keep the
# same per-resource tagging the stack had when it owned the provider, the
# module threads `local.tags` onto every taggable resource itself. Callers
# may layer org-wide tags on top via their own provider `default_tags`
# (those merge with these). `var.tags` is the per-deployment override.
tags = merge(
{
"litellm:stack" = local.name
"managed-by" = "terraform"
},
var.tags,
)
gateway_path_prefixes = [
"/v1/chat/*", "/chat/*",
"/v1/completions*", "/completions*",

View File

@ -42,4 +42,6 @@ resource "aws_ecs_task_definition" "migrations" {
}
}
}])
tags = local.tags
}

View File

@ -7,12 +7,12 @@ resource "aws_vpc" "this" {
enable_dns_hostnames = true
enable_dns_support = true
tags = { Name = local.name }
tags = merge(local.tags, { Name = local.name })
}
resource "aws_internet_gateway" "this" {
vpc_id = aws_vpc.this.id
tags = { Name = local.name }
tags = merge(local.tags, { Name = local.name })
}
# Public subnets (ALB + NAT). One per AZ.
@ -23,7 +23,7 @@ resource "aws_subnet" "public" {
availability_zone = var.azs[count.index]
map_public_ip_on_launch = true
tags = { Name = "${local.name}-public-${var.azs[count.index]}" }
tags = merge(local.tags, { Name = "${local.name}-public-${var.azs[count.index]}" })
}
# Private subnets (ECS tasks, RDS, ElastiCache). One per AZ, separate from
@ -34,12 +34,12 @@ resource "aws_subnet" "private" {
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.azs[count.index]
tags = { Name = "${local.name}-private-${var.azs[count.index]}" }
tags = merge(local.tags, { Name = "${local.name}-private-${var.azs[count.index]}" })
}
resource "aws_eip" "nat" {
domain = "vpc"
tags = { Name = "${local.name}-nat" }
tags = merge(local.tags, { Name = "${local.name}-nat" })
depends_on = [aws_internet_gateway.this]
}
@ -50,7 +50,7 @@ resource "aws_nat_gateway" "this" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public[0].id
tags = { Name = local.name }
tags = merge(local.tags, { Name = local.name })
depends_on = [aws_internet_gateway.this]
}
@ -63,7 +63,7 @@ resource "aws_route_table" "public" {
gateway_id = aws_internet_gateway.this.id
}
tags = { Name = "${local.name}-public" }
tags = merge(local.tags, { Name = "${local.name}-public" })
}
resource "aws_route_table_association" "public" {
@ -80,7 +80,7 @@ resource "aws_route_table" "private" {
nat_gateway_id = aws_nat_gateway.this.id
}
tags = { Name = "${local.name}-private" }
tags = merge(local.tags, { Name = "${local.name}-private" })
}
resource "aws_route_table_association" "private" {
@ -119,6 +119,8 @@ resource "aws_security_group" "alb" {
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = local.tags
}
resource "aws_security_group" "tasks" {
@ -141,6 +143,8 @@ resource "aws_security_group" "tasks" {
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = local.tags
}
resource "aws_security_group" "rds" {
@ -155,6 +159,8 @@ resource "aws_security_group" "rds" {
protocol = "tcp"
security_groups = [aws_security_group.tasks.id]
}
tags = local.tags
}
resource "aws_security_group" "redis" {
@ -169,4 +175,6 @@ resource "aws_security_group" "redis" {
protocol = "tcp"
security_groups = [aws_security_group.tasks.id]
}
tags = local.tags
}

View File

@ -1,13 +0,0 @@
provider "aws" {
region = var.region
default_tags {
tags = merge(
{
"litellm:stack" = local.name
"managed-by" = "terraform"
},
var.tags,
)
}
}

View File

@ -19,12 +19,16 @@
resource "aws_db_subnet_group" "this" {
name = "${local.name}-db"
subnet_ids = aws_subnet.private[*].id
tags = local.tags
}
resource "aws_rds_cluster_parameter_group" "this" {
name = "${local.name}-cluster-pg"
family = "aurora-postgresql${split(".", var.db_engine_version)[0]}"
description = "LiteLLM Aurora Postgres cluster parameters."
tags = local.tags
}
resource "aws_rds_cluster" "this" {
@ -52,6 +56,8 @@ resource "aws_rds_cluster" "this" {
backup_retention_period = 7
preferred_backup_window = "07:00-09:00"
tags = local.tags
}
resource "aws_rds_cluster_instance" "writer" {
@ -67,6 +73,8 @@ resource "aws_rds_cluster_instance" "writer" {
# Promotion tier 0 first in line during failover, so this instance stays
# the writer unless it goes unhealthy.
promotion_tier = 0
tags = local.tags
}
resource "aws_rds_cluster_instance" "reader" {
@ -82,4 +90,6 @@ resource "aws_rds_cluster_instance" "reader" {
# Higher promotion tier won't be picked as writer during a failover
# unless the writer instance itself is gone.
promotion_tier = 15
tags = local.tags
}

View File

@ -1,6 +1,8 @@
resource "aws_elasticache_subnet_group" "this" {
name = "${local.name}-redis"
subnet_ids = aws_subnet.private[*].id
tags = local.tags
}
# Replication group (not aws_elasticache_cluster, which is the
@ -30,4 +32,6 @@ resource "aws_elasticache_replication_group" "this" {
transit_encryption_enabled = true
apply_immediately = true
tags = local.tags
}

View File

@ -18,6 +18,8 @@ resource "aws_s3_bucket" "this" {
# cached responses, archived request logs, and /v1/files storage stay put.
# Flip to true only for ephemeral / CI stacks (`var.s3_force_destroy`).
force_destroy = var.s3_force_destroy
tags = local.tags
}
resource "aws_s3_bucket_versioning" "this" {
@ -72,9 +74,30 @@ data "aws_iam_policy_document" "s3_access" {
resource "aws_iam_policy" "s3_access" {
name = "${local.name}-s3-access"
policy = data.aws_iam_policy_document.s3_access.json
tags = local.tags
}
resource "aws_iam_role_policy_attachment" "task_s3_access" {
role = aws_iam_role.task.name
policy_arn = aws_iam_policy.s3_access.arn
}
# proxy_config is uploaded as an S3 object so the gateway and backend
# containers can fetch it at startup instead of carrying the YAML inline
# as a base64 env var. ECS Fargate has no native S3 volume type, so
# "mount" here is: container entrypoint runs a boto3 download_file into
# /tmp/litellm-config.yaml before exec'ing uvicorn. The task role already
# has s3:GetObject on this bucket via aws_iam_policy.s3_access.
#
# etag flows into the task definition (see locals.proxy_config_env in
# ecs.tf) so a config edit produces a new task-def revision and ECS rolls
# both services automatically.
resource "aws_s3_object" "proxy_config" {
count = length(keys(var.proxy_config)) > 0 ? 1 : 0
bucket = aws_s3_bucket.this.id
key = "config/litellm-config.yaml"
content = yamlencode(var.proxy_config)
content_type = "application/yaml"
}

View File

@ -22,6 +22,8 @@ resource "aws_secretsmanager_secret" "master_key" {
name = "${local.name}-master-key"
description = "LITELLM_MASTER_KEY for gateway + backend."
recovery_window_in_days = 0
tags = local.tags
}
resource "aws_secretsmanager_secret_version" "master_key" {
@ -40,6 +42,8 @@ resource "aws_secretsmanager_secret" "license" {
name = "${local.name}-license"
description = "LITELLM_LICENSE for gateway + backend."
recovery_window_in_days = 0
tags = local.tags
}
resource "aws_secretsmanager_secret_version" "license" {
@ -59,6 +63,8 @@ resource "aws_secretsmanager_secret" "ui_password" {
name = "${local.name}-ui-password"
description = "UI_PASSWORD for the backend (UI admin login)."
recovery_window_in_days = 0
tags = local.tags
}
resource "aws_secretsmanager_secret_version" "ui_password" {
@ -72,6 +78,8 @@ resource "aws_secretsmanager_secret" "db_master_password" {
name = "${local.name}-db-master-password"
description = "Aurora master-user password - bootstrap only. Runtime auth is IAM-token."
recovery_window_in_days = 0
tags = local.tags
}
resource "aws_secretsmanager_secret_version" "db_master_password" {

View File

@ -24,7 +24,7 @@ variable "env" {
}
variable "tags" {
description = "Additional tags merged into the provider default_tags."
description = "Per-deployment tags applied to every taggable resource the module creates, on top of the module's own `litellm:stack` / `managed-by` tags. Caller-level provider `default_tags` (if any) merge with these."
type = map(string)
default = {}
}
@ -420,10 +420,12 @@ variable "backend_extra_secrets" {
variable "proxy_config" {
description = <<-EOT
LiteLLM proxy config (the contents of config.yaml). Mirrors the helm
chart's `gateway.config.proxy_config` value. Passed to gateway, backend,
and the migration task as a base64-encoded env var and decoded to
/tmp/litellm-config.yaml at container start; CONFIG_FILE_PATH is set
automatically.
chart's `gateway.config.proxy_config` value. Uploaded to S3 under
`config/litellm-config.yaml` in the stack's bucket; gateway and backend
container entrypoints download it to /tmp/litellm-config.yaml at task
start (CONFIG_FILE_PATH is set automatically). The S3 object's etag is
wired into the task definition, so editing this value produces a new
task-def revision and a rolling redeploy.
Example:
proxy_config = {
@ -456,3 +458,78 @@ variable "log_retention_days" {
type = number
default = 30
}
# ---------- OpenTelemetry v2 ----------
#
# https://docs.litellm.ai/docs/observability/opentelemetry_v2
#
# OTel v2 is opt-in and gated entirely on otel_endpoint, matching the GCP
# stack. Leave otel_endpoint = "" and nothing OTel-related lands in the
# container env. Set it and the gateway and backend gain LITELLM_OTEL_V2=true
# plus the OTEL_* block (per-component OTEL_SERVICE_NAME, exporter, endpoint,
# environment name, capture-content), with OTEL_HEADERS sourced from
# otel_headers_secret_arn when provided.
variable "otel_endpoint" {
description = <<-EOT
OTLP collector endpoint (sets OTEL_ENDPOINT). Empty disables OTel
entirely (no LITELLM_OTEL_V2, no OTEL_* env). Point at any
OTLP-compatible backend (self-hosted collector, Grafana Tempo,
Honeycomb, Datadog). Example: "http://otel-collector.internal:4318"
for OTLP/HTTP.
EOT
type = string
default = ""
}
variable "otel_exporter" {
description = <<-EOT
OTLP exporter protocol. One of "otlp_http", "otlp_grpc", or "console"
(stdout, useful for verifying instrumentation against CloudWatch logs).
Ignored when otel_endpoint is empty.
EOT
type = string
default = "otlp_http"
validation {
condition = contains(["otlp_http", "otlp_grpc", "console"], var.otel_exporter)
error_message = "otel_exporter must be one of: otlp_http, otlp_grpc, console."
}
}
variable "otel_environment_name" {
description = <<-EOT
Value for OTEL_ENVIRONMENT_NAME (becomes `deployment.environment` on
every span). Defaults to var.env when empty so spans land tagged with
the deployment env without extra wiring.
EOT
type = string
default = ""
}
variable "otel_capture_message_content" {
description = <<-EOT
Value for OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT. Default
`no_content` matches the litellm default; flip to `prompt_and_completion`
only when you've audited what's about to land in your observability
backend, because raw prompts/completions are typically sensitive.
EOT
type = string
default = "no_content"
validation {
condition = contains(["no_content", "prompt_and_completion"], var.otel_capture_message_content)
error_message = "otel_capture_message_content must be one of: no_content, prompt_and_completion."
}
}
variable "otel_headers_secret_arn" {
description = <<-EOT
Secrets Manager ARN whose plaintext value becomes OTEL_HEADERS
(comma-separated `key=value` pairs, typically used to pass an API key
header to a managed collector). The execution role auto-gains
secretsmanager:GetSecretValue on this ARN. Empty omits OTEL_HEADERS.
EOT
type = string
default = ""
}

View File

@ -1,5 +1,9 @@
# LiteLLM on GCP (Cloud Run)
[![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.svg)](https://ssh.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https%3A%2F%2Fgithub.com%2FBerriAI%2Flitellm&cloudshell_workspace=terraform%2Flitellm%2Fgcp%2Fexamples%2Fdefault&cloudshell_tutorial=TUTORIAL.md&cloudshell_image=gcr.io/ds-artifacts-cloudshell/deploystack_custom_image&shellonly=true)
The button above opens the [DeployStack](https://github.com/GoogleCloudPlatform/deploystack) installer in Cloud Shell, walks you through `TUTORIAL.md`, and runs `terraform apply` once you've answered the prompts. The rest of this README is the manual / advanced path.
Deploys the componentized LiteLLM proxy on GCP:
- **VPC** + Private Services Access range + a Serverless VPC Access connector
@ -25,10 +29,14 @@ and `litellm-migrations` (slim image used only by the one-off Cloud Run
Job — runs `prisma migrate deploy` against the writer DB and exits).
Bump them together when bumping LiteLLM.
Cloud Run only accepts images from Artifact Registry, `[region.]gcr.io`,
or `docker.io``ghcr.io` URIs are rejected at apply time. The four
images are published to GHCR upstream, so any real deploy needs an
Artifact Registry remote repository pointed at GHCR.
**Required override.** The `image_registry` default (`ghcr.io/berriai`)
does **not** work as-is — Cloud Run only accepts images from Artifact
Registry, `[region.]gcr.io`, or `docker.io`, and rejects `ghcr.io` URIs
at apply time. Every deploy (including HCP Terraform 1-click) must
supply either `image_registry` pointed at an Artifact Registry remote
repo backed by GHCR, or full per-component `*_image` URIs against
images you've already mirrored. The default is present only so
`terraform plan` succeeds during local iteration.
**One-time setup (per project):** create a remote repo and let Cloud Run
pull through it.
@ -102,9 +110,13 @@ Unix socket.
### `proxy_config`
Mirrors the helm chart's `gateway.config.proxy_config`. The map is
YAML-encoded and base64-passed to gateway, backend, and the migration job;
each container decodes it to `/tmp/litellm-config.yaml` at startup and sets
`CONFIG_FILE_PATH`.
YAML-encoded and uploaded to a dedicated GCS bucket as `config.yaml`, then
mounted read-only into the gateway and backend at `/etc/litellm` via Cloud
Run v2's gcsfuse volume. `CONFIG_FILE_PATH` points at the mount path. A
hash of the YAML rides along as an env var so an edit to `proxy_config`
forces a new Cloud Run revision; without it the new file would sit in the
bucket unread until the next unrelated revision rollover. The migrations
job doesn't get the config (it only runs `prisma migrate deploy`).
```hcl
proxy_config = {
@ -160,6 +172,38 @@ reject the version suffix; version is always resolved as `latest`. If
you need a pinned version, edit `local.gateway_extra_secret_kv` in
`cloudrun.tf` directly to set `version = "3"` for the entry in question.
### OpenTelemetry v2
OTel v2 (https://docs.litellm.ai/docs/observability/opentelemetry_v2) is
opt-in and gated entirely on `otel_endpoint`. Empty (default) and nothing
OTel-related lands in the container env. Set it and both gateway and
backend gain `LITELLM_OTEL_V2=true` plus the `OTEL_*` block, with
`OTEL_SERVICE_NAME` stamped per component (`${tenant}-litellm-${env}-gateway`
and `-backend`) so spans land tagged with the right hop. Any `OTEL_*` key
set in `gateway_extra_env` / `backend_extra_env` overrides the default for
that service (Cloud Run rejects duplicate env names, so the override is
predictable).
```hcl
otel_endpoint = "https://otel.example.com:4318"
otel_exporter = "otlp_http" # or otlp_grpc
otel_environment_name = "prod" # default: var.env
otel_headers_secret = "projects/my-gcp-project/secrets/otel-headers"
```
`OTEL_HEADERS` is wired as a Secret Manager `secret_key_ref` since it
typically carries the collector's auth token; create the secret with the
literal header string, e.g. `Authorization=Bearer <token>`.
`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` defaults to
`no_content`; flip `otel_capture_message_content = "prompt_and_completion"`
only after auditing what lands in the backend, since prompts and
completions are typically sensitive.
Behavior matches the AWS stack 1:1; the only naming differences are
`otel_headers_secret` (a Secret Manager resource ID) vs AWS's
`otel_headers_secret_arn` (a Secrets Manager ARN).
## Tenant deployment
Every resource the stack creates is named `${tenant}-litellm-${env}` (or
@ -173,20 +217,25 @@ pair differs:
| `acme` | `prod` | `acme-litellm-prod-master-key` |
| `globex` | `dev` | `globex-litellm-dev-license` |
For a per-tenant instance, the only inputs that change are the tenant
slug, env, and the two pre-issued secrets:
For a per-tenant instance via the example root, the only inputs that
change are the tenant slug, env, and the two pre-issued secrets:
```bash
cd terraform/litellm/gcp/examples/default
export TF_VAR_litellm_master_key="sk-..." # the tenant's master key
export TF_VAR_litellm_license="lic-..." # their LITELLM_LICENSE
terraform apply \
-var "project=my-gcp-project" \
-var "project_id=my-gcp-project" \
-var "region=us-central1" \
-var "tenant=acme" \
-var "env=stage"
```
To run *many* tenants from a single config, call the module with
`for_each` instead of one root per tenant — only possible because the
module declares no provider block (see "Using as a module").
Both `litellm_master_key` and `litellm_license` are optional:
- Omit `litellm_master_key` → the stack auto-generates a random `sk-…`
value (trial/dev path).
@ -200,14 +249,22 @@ example files.
## Quick start
```bash
cd terraform/litellm/gcp
cd terraform/litellm/gcp/examples/default
cp terraform.tfvars.example terraform.tfvars
# Edit: project, region, tenant, env, *_image, proxy_config, gateway_extra_secrets.
# Edit: project, region, tenant, env, image_registry, proxy_config, gateway_extra_secrets.
terraform init
terraform apply
```
`examples/default/` is a thin root that configures the `google` /
`google-beta` providers and calls the module (`../../`). It exposes a
curated variable surface; for advanced knobs (per-component
CPU/memory/instances, Cloud SQL tier/edition, Memorystore tier,
per-component image pins) set them on the `module "litellm"` block in
`examples/default/main.tf`, or call the module from your own config — see
"Using as a module" below.
That single apply provisions everything, runs the prisma schema migration via
the Cloud Run job (auto-triggered by `bootstrap.tf`), and only then starts the
gateway/backend services. When it returns, the stack is serving traffic.
@ -251,6 +308,56 @@ Set `allow_plaintext_lb = true` and leave `lb_domains = []`. Without the
flag, plan fails with a clear error pointing at the precondition.
Intended for short-lived trial / dev stacks only.
## Using as a module
The directory itself is a module with **no `provider` block** — the caller
owns provider config. You can call it directly with `for_each` (many
tenants from one config), `count`, `depends_on`, or providers configured
to impersonate a service account / target a different project:
```hcl
provider "google" {
project = "my-gcp-project"
region = "us-central1"
}
provider "google-beta" {
project = "my-gcp-project"
region = "us-central1"
}
module "litellm" {
source = "github.com/BerriAI/litellm//terraform/litellm/gcp?ref=<tag>"
project = "my-gcp-project"
region = "us-central1"
tenant = "acme"
env = "prod"
# ...any of the inputs in variables.tf...
}
```
Both the default `google` and `google-beta` configs are inherited by the
module automatically through the call; declare both in the caller.
Labels: the module stamps its own `litellm-stack` and `managed-by` labels
onto every label-supporting resource (Cloud Run services and the
migrations job, Cloud SQL writer and reader, Memorystore, Secret Manager
entries, GCS buckets, the LB global address and forwarding rules) and
merges `var.labels` on top. Use the `labels` input for per-deployment
labels; mirrors the AWS stack's `tags` input.
**`for_each` shares one provider config.** The module's `versions.tf` declares
`google` / `google-beta` *without* `configuration_aliases`, so it only ever
receives the caller's single default (unaliased) `google` / `google-beta`
providers. That's deliberate — it keeps the one-command path simple — but it
means a `for_each` over the module runs every instance against the **same
project, region, and credentials**. Use `for_each` for many tenants in one
project (distinct `tenant`/`env`); it cannot fan out across projects or regions
on its own. To deploy into separate projects/regions, give each its own root
with its own provider config (one `examples/default`-style root per project),
or fork the module to add `configuration_aliases` and pass per-instance
`providers = { ... }`.
## Storage and database retention
Two opt-in tripwires guard against accidental data loss on
@ -281,8 +388,8 @@ or point them at your own CA.
| File | What's in it |
| ----------------- | -------------------------------------------------------------------- |
| `versions.tf` | Terraform + provider version constraints |
| `providers.tf` | Google + Google-Beta providers |
| `versions.tf` | Terraform + `required_providers` constraints (module declares no provider config) |
| `examples/default/` | Thin root: `google` / `google-beta` providers + a call to the module. The one-command deploy path. |
| `variables.tf` | All input variables |
| `locals.tf` | Path-prefix lists (mirror of `helm/.../ingress.yaml`) + proxy_config helpers |
| `network.tf` | VPC, subnet, PSA range, Serverless VPC connector |

View File

@ -25,7 +25,7 @@ resource "terraform_data" "migration" {
environment = {
JOB = google_cloud_run_v2_job.migrations.name
REGION = var.region
PROJECT = var.project
PROJECT = var.project_id
}
command = <<-EOT
set -euo pipefail

View File

@ -26,6 +26,39 @@ locals {
{ name = "GCS_BUCKET_NAME", value = google_storage_bucket.this.name },
]
# OTel v2 is opt-in and gated on otel_endpoint, matching the AWS stack
# nothing OTel-related is added to the container env until an endpoint is
# set. LITELLM_OTEL_V2 flips on alongside the OTEL_* block so the proxy
# never boots the instrumentation with no exporter wired in.
otel_enabled = var.otel_endpoint != ""
otel_environment_name = var.otel_environment_name != "" ? var.otel_environment_name : var.env
otel_shared_endpoint_kv = local.otel_enabled ? [
{ name = "LITELLM_OTEL_V2", value = "true" },
{ name = "OTEL_EXPORTER", value = var.otel_exporter },
{ name = "OTEL_ENDPOINT", value = var.otel_endpoint },
{ name = "OTEL_ENVIRONMENT_NAME", value = local.otel_environment_name },
{ name = "OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT", value = var.otel_capture_message_content },
] : []
# OTel defaults are filtered out when the same key appears in
# *_extra_env, so a caller-supplied OTEL_SERVICE_NAME (or any other
# OTEL_*) takes precedence without colliding at Cloud Run apply time
# (Cloud Run rejects duplicate env var names).
gateway_otel_env_kv_raw = concat(local.otel_shared_endpoint_kv, local.otel_enabled ? [
{ name = "OTEL_SERVICE_NAME", value = "${local.name}-gateway" },
] : [])
backend_otel_env_kv_raw = concat(local.otel_shared_endpoint_kv, local.otel_enabled ? [
{ name = "OTEL_SERVICE_NAME", value = "${local.name}-backend" },
] : [])
gateway_otel_env_kv = [
for e in local.gateway_otel_env_kv_raw : e if !contains(keys(var.gateway_extra_env), e.name)
]
backend_otel_env_kv = [
for e in local.backend_otel_env_kv_raw : e if !contains(keys(var.backend_extra_env), e.name)
]
otel_env_secrets = local.otel_enabled && var.otel_headers_secret != "" ? [
{ name = "OTEL_HEADERS", secret = var.otel_headers_secret, version = "latest" },
] : []
# Cloud Run v2 secret env vars use value_source.secret_key_ref pointing at a
# secret resource ID. Shared between gateway and backend (the migrations
# job has its own narrower env list see migrations_env_secrets below).
@ -63,13 +96,6 @@ locals {
for k, v in var.backend_extra_secrets : { name = k, secret = v, version = "latest" }
]
# Shell fragments composed with && so any failure short-circuits the
# whole startup instead of falling through to `exec uvicorn`. The
# python step is only included when the caller provided a proxy_config.
proxy_config_fragment = local.proxy_config_enabled ? [
"python -c \"import os, base64, pathlib; pathlib.Path(os.environ['CONFIG_FILE_PATH']).write_bytes(base64.b64decode(os.environ['LITELLM_PROXY_CONFIG_B64']))\""
] : []
# Decode the Memorystore CA cert (passed as REDIS_CA_PEM_B64) to the
# path REDIS_SSL_CA_CERTS points at, so the redis-py client can validate
# the rediss:// handshake.
@ -83,14 +109,12 @@ locals {
]
gateway_args = join(" && ", concat(
local.proxy_config_fragment,
local.redis_ca_fragment,
local.database_url_fragment,
["exec uvicorn gateway.main:app --host 0.0.0.0 --port 4000"],
["exec uvicorn gateway.main:app --host 0.0.0.0 --port 4000 --workers ${var.gateway_num_workers}"],
))
backend_args = join(" && ", concat(
local.proxy_config_fragment,
local.redis_ca_fragment,
local.database_url_fragment,
["exec uvicorn backend.main:app --host 0.0.0.0 --port 4001"],
@ -117,6 +141,7 @@ resource "google_cloud_run_v2_service" "gateway" {
name = "${local.name}-gateway"
location = var.region
ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
labels = local.labels
template {
service_account = google_service_account.runtime.email
@ -149,7 +174,7 @@ resource "google_cloud_run_v2_service" "gateway" {
}
dynamic "env" {
for_each = concat(local.shared_env_kv, local.gateway_extra_env_kv, local.proxy_config_env)
for_each = concat(local.shared_env_kv, local.gateway_otel_env_kv, local.gateway_extra_env_kv, local.proxy_config_env)
content {
name = env.value.name
value = env.value.value
@ -157,7 +182,7 @@ resource "google_cloud_run_v2_service" "gateway" {
}
dynamic "env" {
for_each = concat(local.shared_env_secrets, local.gateway_extra_secret_kv)
for_each = concat(local.shared_env_secrets, local.otel_env_secrets, local.gateway_extra_secret_kv)
content {
name = env.value.name
value_source {
@ -169,6 +194,14 @@ resource "google_cloud_run_v2_service" "gateway" {
}
}
dynamic "volume_mounts" {
for_each = local.proxy_config_enabled ? [1] : []
content {
name = local.proxy_config_volume
mount_path = local.proxy_config_mount_path
}
}
startup_probe {
http_get {
path = "/health/readiness"
@ -189,6 +222,17 @@ resource "google_cloud_run_v2_service" "gateway" {
timeout_seconds = 5
}
}
dynamic "volumes" {
for_each = local.proxy_config_enabled ? [1] : []
content {
name = local.proxy_config_volume
gcs {
bucket = google_storage_bucket.proxy_config[0].name
read_only = true
}
}
}
}
depends_on = [
@ -196,6 +240,8 @@ resource "google_cloud_run_v2_service" "gateway" {
google_secret_manager_secret_iam_member.db_password,
google_secret_manager_secret_iam_member.license,
google_secret_manager_secret_iam_member.extras,
google_secret_manager_secret_iam_member.otel_headers,
google_storage_bucket_iam_member.proxy_config_runtime,
google_sql_user.app,
# Don't go live until the schema is migrated; otherwise the proxy boots,
# fails on missing tables, and Cloud Run keeps cold-restarting.
@ -208,6 +254,7 @@ resource "google_cloud_run_v2_service" "backend" {
name = "${local.name}-backend"
location = var.region
ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
labels = local.labels
template {
service_account = google_service_account.runtime.email
@ -240,7 +287,7 @@ resource "google_cloud_run_v2_service" "backend" {
}
dynamic "env" {
for_each = concat(local.shared_env_kv, local.backend_default_env_kv, local.backend_extra_env_kv, local.proxy_config_env)
for_each = concat(local.shared_env_kv, local.backend_default_env_kv, local.backend_otel_env_kv, local.backend_extra_env_kv, local.proxy_config_env)
content {
name = env.value.name
value = env.value.value
@ -248,7 +295,7 @@ resource "google_cloud_run_v2_service" "backend" {
}
dynamic "env" {
for_each = concat(local.shared_env_secrets, local.backend_managed_env_secrets, local.backend_extra_secret_kv)
for_each = concat(local.shared_env_secrets, local.backend_managed_env_secrets, local.otel_env_secrets, local.backend_extra_secret_kv)
content {
name = env.value.name
value_source {
@ -260,6 +307,14 @@ resource "google_cloud_run_v2_service" "backend" {
}
}
dynamic "volume_mounts" {
for_each = local.proxy_config_enabled ? [1] : []
content {
name = local.proxy_config_volume
mount_path = local.proxy_config_mount_path
}
}
startup_probe {
http_get {
path = "/health/readiness"
@ -280,6 +335,17 @@ resource "google_cloud_run_v2_service" "backend" {
timeout_seconds = 5
}
}
dynamic "volumes" {
for_each = local.proxy_config_enabled ? [1] : []
content {
name = local.proxy_config_volume
gcs {
bucket = google_storage_bucket.proxy_config[0].name
read_only = true
}
}
}
}
depends_on = [
@ -288,6 +354,8 @@ resource "google_cloud_run_v2_service" "backend" {
google_secret_manager_secret_iam_member.license,
google_secret_manager_secret_iam_member.ui_password,
google_secret_manager_secret_iam_member.extras,
google_secret_manager_secret_iam_member.otel_headers,
google_storage_bucket_iam_member.proxy_config_runtime,
google_sql_user.app,
terraform_data.migration,
]
@ -301,6 +369,7 @@ resource "google_cloud_run_v2_service" "ui" {
name = "${local.name}-ui"
location = var.region
ingress = "INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER"
labels = local.labels
template {
service_account = google_service_account.ui_runtime.email
@ -344,7 +413,7 @@ resource "google_cloud_run_v2_service" "ui" {
# (LITELLM_MASTER_KEY); these IAM bindings just open up Cloud Run's invoker
# gate so the LB request makes it to the container.
resource "google_cloud_run_v2_service_iam_member" "gateway_allusers" {
project = var.project
project = var.project_id
location = google_cloud_run_v2_service.gateway.location
name = google_cloud_run_v2_service.gateway.name
role = "roles/run.invoker"
@ -352,7 +421,7 @@ resource "google_cloud_run_v2_service_iam_member" "gateway_allusers" {
}
resource "google_cloud_run_v2_service_iam_member" "backend_allusers" {
project = var.project
project = var.project_id
location = google_cloud_run_v2_service.backend.location
name = google_cloud_run_v2_service.backend.name
role = "roles/run.invoker"
@ -360,7 +429,7 @@ resource "google_cloud_run_v2_service_iam_member" "backend_allusers" {
}
resource "google_cloud_run_v2_service_iam_member" "ui_allusers" {
project = var.project
project = var.project_id
location = google_cloud_run_v2_service.ui.location
name = google_cloud_run_v2_service.ui.name
role = "roles/run.invoker"
@ -374,6 +443,7 @@ resource "google_cloud_run_v2_service_iam_member" "ui_allusers" {
resource "google_cloud_run_v2_job" "migrations" {
name = "${local.name}-migrations"
location = var.region
labels = local.labels
template {
template {

View File

@ -26,6 +26,8 @@ resource "google_sql_database_instance" "writer" {
disk_size = 20
disk_autoresize = true
user_labels = local.labels
backup_configuration {
enabled = true
point_in_time_recovery_enabled = true
@ -45,6 +47,15 @@ resource "google_sql_database_instance" "writer" {
}
deletion_protection = var.cloudsql_deletion_protection
lifecycle {
# disk_autoresize grows storage but never shrinks it. Without this,
# the first plan after any auto-grow reads disk_size as a shrink, which
# is an immutable change and forces a destroy/recreate of the instance
# (full data loss). Set the initial size only; let Cloud SQL own it
# thereafter.
ignore_changes = [settings[0].disk_size]
}
}
resource "google_sql_database_instance" "reader" {
@ -61,6 +72,8 @@ resource "google_sql_database_instance" "reader" {
availability_type = "ZONAL"
disk_autoresize = true
user_labels = local.labels
ip_configuration {
ipv4_enabled = false
private_network = google_compute_network.this.id
@ -68,6 +81,12 @@ resource "google_sql_database_instance" "reader" {
}
deletion_protection = var.cloudsql_deletion_protection
lifecycle {
# Same autoresize footgun as the writer the replica grows its disk
# independently. Never let a perceived shrink replace the instance.
ignore_changes = [settings[0].disk_size]
}
}
resource "google_sql_database" "this" {
@ -91,6 +110,7 @@ resource "google_sql_user" "app" {
resource "google_secret_manager_secret" "db_password" {
secret_id = "${local.name}-db-password"
labels = local.labels
replication {
auto {}
}

View File

@ -0,0 +1,63 @@
# This file is maintained automatically by "terraform init".
# Manual edits may be lost in future updates.
provider "registry.terraform.io/hashicorp/google" {
version = "6.50.0"
constraints = "~> 6.10"
hashes = [
"h1:79CwMTsp3Ud1nOl5hFS5mxQHyT0fGVye7pqpU0PPlHI=",
"zh:1f3513fcfcbf7ca53d667a168c5067a4dd91a4d4cccd19743e248ff31065503c",
"zh:3da7db8fc2c51a77dd958ea8baaa05c29cd7f829bd8941c26e2ea9cb3aadc1e5",
"zh:3e09ac3f6ca8111cbb659d38c251771829f4347ab159a12db195e211c76068bb",
"zh:7bb9e41c568df15ccf1a8946037355eefb4dfb4e35e3b190808bb7c4abae547d",
"zh:81e5d78bdec7778e6d67b5c3544777505db40a826b6eb5abe9b86d4ba396866b",
"zh:8d309d020fb321525883f5c4ea864df3d5942b6087f6656d6d8b3a1377f340fc",
"zh:93e112559655ab95a523193158f4a4ac0f2bfed7eeaa712010b85ebb551d5071",
"zh:d3efe589ffd625b300cef5917c4629513f77e3a7b111c9df65075f76a46a63c7",
"zh:d4a4d672bbef756a870d8f32b35925f8ce2ef4f6bbd5b71a3cb764f1b6c85421",
"zh:e13a86bca299ba8a118e80d5f84fbdd708fe600ecdceea1a13d4919c068379fe",
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
"zh:fec30c095647b583a246c39d557704947195a1b7d41f81e369ba377d997faef6",
]
}
provider "registry.terraform.io/hashicorp/google-beta" {
version = "6.50.0"
constraints = "~> 6.10"
hashes = [
"h1:P2GiUJM1frlPtBViwKn1A9V2dVBdGuWcX80w9TdH8ZE=",
"zh:18b442bd0a05321d39dda1e9e3f1bdede4e61bc2ac62cc7a67037a3864f75101",
"zh:2e387c51455862828bec923a3ec81abf63a4d998da470cf00e09003bda53d668",
"zh:3942e708fa84ebe54996086f4b1398cb747fe19cbcd0be07ace528291fb35dee",
"zh:496287dd48b34ae6197cb1f887abeafd07c33f389dbe431bb01e24846754cfdd",
"zh:6eca885419969ce5c2a706f34dce1f10bde9774757675f2d8a92d12e5a1be390",
"zh:710dbef826c3fe7f76f844dae47937e8e4c1279dd9205ec4610be04cf3327244",
"zh:777ebf44b24bfc7bdbf770dc089f1a72f143b4718fdedb8c6bd75983115a1ec2",
"zh:9c8703bba37b8c7ad857efc3513392c5a096c519397c1cb822d7612f38e4262f",
"zh:c4f1d3a73de2702277c99d5348ad6d374705bcfdd367ad964ff4cfd2cf06c281",
"zh:eca8df11af3f5a948492d5b8b5d01b4ec705aad10bc30ec1524205508ae28393",
"zh:f41e7fd5f2628e8fd6b8ea136366923858f54428d1729898925469b862c275c2",
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
]
}
provider "registry.terraform.io/hashicorp/random" {
version = "3.9.0"
constraints = "~> 3.6"
hashes = [
"h1:OO+IuvQJSPmWdN8AyyIEvPJbLvDQpgX/zbktoa9KsJE=",
"zh:161ad0bd9a75768c82f53fb6e7172a9d8be2d4889b012645a34795031aaf1bf1",
"zh:19dc9a5b17729725ccfc4f45b0500af0ee5bc6b6b160c7adb8f2bf617d2c80ea",
"zh:269eda8fe42daa7974d5a34d166c3ba9defe80cde86c01e4dadcfdf2e1f05e5f",
"zh:373f7c65566f8f2cc7f45d698654feb9d988996957e1266a69ca00c52d6d16d0",
"zh:5599d16804c41c83009ec621b6d6b6f74e102f5827678a4750f8809055546b61",
"zh:583be0440469a22bff70dcfa56593b01566860b29607437264adb51060cf46fc",
"zh:5f211d8ec3f2e1f414870d9584bfe26e6995560ef81c748f8447a48164767398",
"zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
"zh:7b547fd16216761ef86efc3ed516ac5ac0c5c42b7c7eb24a08cef2d93f69ed5e",
"zh:7e7c0679daf2a382151d05068c8c3f0dae6b7b7dccf818827b73dd08638df2ef",
"zh:8089dec888a8038b9b4fb23b3df7e1057293dbc5b60b42cc47ff690d69d4b61b",
"zh:c51f15a031edfd6f23ce8ced3446ca7f8d8d647e2499890d7d5d10d5016d7257",
"zh:c94784f005708890dc6895afd53636ec00ec1e430b15d41e5aebfb1d4b39bd04",
]
}

View File

@ -0,0 +1,134 @@
# Deploy LiteLLM on GCP
<walkthrough-tutorial-duration duration="25"></walkthrough-tutorial-duration>
This walkthrough provisions the full LiteLLM stack on GCP via Cloud Run, Cloud SQL, Memorystore Redis, and an external HTTPS load balancer. You'll answer a few prompts; DeployStack writes a `terraform.tfvars` and runs `terraform apply` against the project you select.
## Prerequisites
<walkthrough-project-billing-setup></walkthrough-project-billing-setup>
Pick the GCP project you want to deploy into, then make sure billing is enabled on it. The stack provisions paid resources (Cloud SQL, Memorystore, an LB anycast IP).
## Enable required APIs
The stack needs these APIs enabled in the target project. Click to enable, or run the gcloud command below.
<walkthrough-enable-apis apis="run.googleapis.com,sqladmin.googleapis.com,redis.googleapis.com,secretmanager.googleapis.com,vpcaccess.googleapis.com,compute.googleapis.com,servicenetworking.googleapis.com,storage.googleapis.com,artifactregistry.googleapis.com"></walkthrough-enable-apis>
```bash
gcloud services enable \
run.googleapis.com \
sqladmin.googleapis.com \
redis.googleapis.com \
secretmanager.googleapis.com \
vpcaccess.googleapis.com \
compute.googleapis.com \
servicenetworking.googleapis.com \
storage.googleapis.com \
artifactregistry.googleapis.com
```
## Create the Artifact Registry passthrough to GHCR
Cloud Run only pulls from Artifact Registry, `gcr.io`, or `docker.io`; it rejects `ghcr.io` URIs at apply time. The four LiteLLM images live on GHCR, so the stack needs a remote Artifact Registry repo pointed at GHCR. This is a one-time setup per project.
```bash
gcloud artifacts repositories create litellm \
--repository-format=docker \
--location=<walkthrough-watcher-constant key="region" default="us-central1"/> \
--mode=remote-repository \
--remote-repo-config-desc="GitHub Container Registry passthrough" \
--remote-docker-repo=https://ghcr.io
```
If the repo already exists, this command exits with a clear error and you can move on. Then set `image_registry` in `terraform.tfvars` to `<region>-docker.pkg.dev/<your-project>/litellm/berriai` before applying.
## (Optional) Set tenant secrets
The stack auto-generates a `LITELLM_MASTER_KEY` if you don't supply one. If you have an enterprise license or want a pre-chosen master key, export them as `TF_VAR_*` env vars before running the installer so they end up in Secret Manager but not in `terraform.tfvars`.
```bash
export TF_VAR_litellm_master_key="sk-..." # optional; auto-generated if omitted
export TF_VAR_litellm_license="lic-..." # optional; OSS-only without it
export TF_VAR_ui_password="..." # optional; falls back to master_key for UI login
```
Skip this step entirely for a trial deploy.
## Run the installer
DeployStack will prompt for project, region, tenant, env, image tag, and TLS posture, then run `terraform apply`. Open `<walkthrough-editor-open-file filePath="terraform/litellm/gcp/examples/default/deploystack.json">deploystack.json</walkthrough-editor-open-file>` if you want to see the prompt definitions first.
```bash
deploystack install
```
The first apply takes 20-25 minutes; most of that is Cloud SQL provisioning. The migration Cloud Run Job runs automatically once the database is ready, and only then do gateway, backend, and UI start.
## Grab the LB URL
```bash
terraform output lb_url
```
For trial deploys (`allow_plaintext_lb=true`), this is `http://<lb-ip>`. The UI lives at `/ui`; sign in with username `admin` and the master key:
```bash
gcloud secrets versions access latest \
--secret="$(terraform output -raw master_key_secret_id)"
```
## Going to TLS
If you picked `allow_plaintext_lb=true` to bootstrap but want HTTPS for real, point a DNS A record at the LB IP, then re-run terraform with `lb_domains` set and `allow_plaintext_lb` removed:
```bash
terraform apply \
-var 'lb_domains=["proxy.example.com"]'
```
Google-managed certs sit in `PROVISIONING` for 15-60 minutes after DNS propagates. You can watch the state with `gcloud compute ssl-certificates describe <tenant>-litellm-<env>-cert`.
## Adding provider API keys
Provider keys (OpenAI, Anthropic, etc.) belong in Secret Manager, not in `terraform.tfvars`. Create the secret first, then reference its resource ID from `gateway_extra_secrets` and re-apply:
```bash
echo -n "sk-proj-..." | gcloud secrets create openai-api-key --data-file=-
```
Edit `terraform.tfvars`:
```hcl
gateway_extra_secrets = {
OPENAI_API_KEY = "projects/<your-project>/secrets/openai-api-key"
}
proxy_config = {
model_list = [
{
model_name = "gpt-4o"
litellm_params = {
model = "openai/gpt-4o"
api_key = "os.environ/OPENAI_API_KEY"
}
},
]
}
```
Then `terraform apply`.
## Tearing it all down
```bash
deploystack uninstall
```
`cloudsql_deletion_protection` is `true` by default; flip it to `false` in `terraform.tfvars` and apply before uninstalling if you actually want the DB gone. Same goes for `gcs_force_destroy` on the bucket.
## You're done
<walkthrough-conclusion-trophy></walkthrough-conclusion-trophy>
Full configuration reference is in `<walkthrough-editor-open-file filePath="terraform/litellm/gcp/README.md">README.md</walkthrough-editor-open-file>`, and every input variable on the underlying module lives in `<walkthrough-editor-open-file filePath="terraform/litellm/gcp/variables.tf">variables.tf</walkthrough-editor-open-file>`.

View File

@ -0,0 +1,37 @@
{
"title": "LiteLLM on GCP (Cloud Run)",
"name": "litellm-gcp",
"description": "Deploys the LiteLLM proxy on GCP: Cloud Run gateway/backend/UI, Cloud SQL with a read replica, Memorystore Redis, a GCS bucket, Secret Manager entries, and an external HTTPS load balancer. Takes ~20-25 minutes on the first apply.",
"duration": 25,
"documentation_link": "https://github.com/BerriAI/litellm/blob/main/terraform/litellm/gcp/README.md",
"collect_project": true,
"collect_region": true,
"region_type": "run",
"region_default": "us-central1",
"collect_zone": false,
"custom_settings": [
{
"name": "tenant",
"description": "Tenant slug used as the prefix for every GCP resource the stack creates (e.g. 'acme' produces 'acme-litellm-<env>-gateway'). 1-21 lowercase chars starting with a letter",
"default": "acme",
"validation": "^[a-z][a-z0-9-]{0,20}$"
},
{
"name": "env",
"description": "Environment suffix appended to every resource name (e.g. 'stage', 'prod', 'dev'). 1-9 lowercase chars starting with a letter",
"default": "stage",
"validation": "^[a-z][a-z0-9-]{0,8}$"
},
{
"name": "image_tag",
"description": "Tag for the four litellm-* images (gateway, backend, ui, migrations). Bump together when bumping LiteLLM",
"default": "v1.86.0-dev"
},
{
"name": "allow_plaintext_lb",
"description": "Skip TLS on the load balancer (HTTP-only). Set true for trial/dev. For production, leave false and add lb_domains to terraform.tfvars after the first apply",
"default": "true",
"options": ["true", "false"]
}
]
}

View File

@ -0,0 +1,51 @@
# One-command deploy of the LiteLLM GCP stack.
#
# cd terraform/litellm/gcp/examples/default
# cp terraform.tfvars.example terraform.tfvars # edit it
# terraform init
# terraform apply
#
# This root just wires the providers (see providers.tf) to the module. The
# module itself (../../) declares no provider, so it can also be consumed
# from your own config with count/for_each or impersonated-SA providers:
#
# module "litellm" {
# source = "github.com/BerriAI/litellm//terraform/litellm/gcp?ref=<tag>"
# ...
# }
#
# Note: the module declares no `configuration_aliases`, so it receives only the
# caller's single default google/google-beta providers a `for_each` over it
# runs every instance against the same project/region/credentials. To fan out
# across projects or regions, use one root per project. See the GCP README's
# "Using as a module" section.
#
# Knobs not surfaced as variables here (per-component sizing/instances,
# Cloud SQL tier/edition, Memorystore tier, per-component image overrides)
# can be set directly on this block see ../../variables.tf.
module "litellm" {
source = "../../"
project_id = var.project_id
region = var.region
tenant = var.tenant
env = var.env
litellm_master_key = var.litellm_master_key
litellm_license = var.litellm_license
ui_password = var.ui_password
image_registry = var.image_registry
image_tag = var.image_tag
lb_domains = var.lb_domains
allow_plaintext_lb = var.allow_plaintext_lb
cloudsql_deletion_protection = var.cloudsql_deletion_protection
gcs_force_destroy = var.gcs_force_destroy
proxy_config = var.proxy_config
gateway_extra_env = var.gateway_extra_env
backend_extra_env = var.backend_extra_env
gateway_extra_secrets = var.gateway_extra_secrets
backend_extra_secrets = var.backend_extra_secrets
}

View File

@ -0,0 +1,59 @@
output "lb_ip" {
description = "Global anycast IP of the external load balancer."
value = module.litellm.lb_ip
}
output "lb_url" {
description = "Proxy URL. Dashboard at /, API at /v1/*."
value = module.litellm.lb_url
}
output "gateway_service_url" {
description = "Default Cloud Run URL for the gateway (bypasses the LB)."
value = module.litellm.gateway_service_url
}
output "backend_service_url" {
description = "Default Cloud Run URL for the backend (bypasses the LB)."
value = module.litellm.backend_service_url
}
output "ui_service_url" {
description = "Default Cloud Run URL for the UI (bypasses the LB)."
value = module.litellm.ui_service_url
}
output "cloudsql_writer_ip" {
description = "Private IP of the Cloud SQL writer."
value = module.litellm.cloudsql_writer_ip
}
output "cloudsql_reader_ip" {
description = "Private IP of the Cloud SQL read replica."
value = module.litellm.cloudsql_reader_ip
}
output "redis_endpoint" {
description = "Memorystore Redis endpoint."
value = module.litellm.redis_endpoint
}
output "gcs_bucket" {
description = "GCS bucket name."
value = module.litellm.gcs_bucket
}
output "master_key_secret_id" {
description = "Secret Manager resource ID holding LITELLM_MASTER_KEY."
value = module.litellm.master_key_secret_id
}
output "db_password_secret_id" {
description = "Secret Manager resource ID holding the Cloud SQL app-user password."
value = module.litellm.db_password_secret_id
}
output "migration_run_command" {
description = "Break-glass command to re-run the one-off migration job."
value = module.litellm.migration_run_command
}

View File

@ -0,0 +1,17 @@
# Providers are configured HERE, in the root, not in the module. A module
# that declares its own configured `provider` block can't be called with
# count/for_each/depends_on and gives the caller no way to set an
# impersonated service account, a different project, or aliases.
#
# The module's resources inherit these default (unaliased) `google` /
# `google-beta` configs automatically through the module call, so project
# and region set here flow into every resource that doesn't pass its own.
provider "google" {
project = var.project_id
region = var.region
}
provider "google-beta" {
project = var.project_id
region = var.region
}

View File

@ -1,5 +1,5 @@
project = "my-gcp-project"
region = "us-central1"
project_id = "my-gcp-project"
region = "us-central1"
# Resource naming: every GCP resource the stack creates is named
# `${tenant}-litellm-${env}` (or that plus a per-resource suffix). E.g.
@ -28,14 +28,14 @@ env = "stage"
# cloudsql_deletion_protection = true # default: refuse destroy on the DB
# gcs_force_destroy = false # default: refuse destroy on a non-empty bucket
# Component images. Defaults pin all four to the same GHCR release tag —
# bump them together when bumping LiteLLM. To use private images, mirror
# them into Artifact Registry first — Cloud Run only authenticates against
# AR / gcr.io.
# gateway_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/gateway:1.86.0-dev"
# backend_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/backend:1.86.0-dev"
# ui_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/ui:1.86.0-dev"
# migrations_image = "us-central1-docker.pkg.dev/my-gcp-project/litellm/migrations:1.86.0-dev"
# Images. Cloud Run rejects ghcr.io, so a real deploy must point
# image_registry at an Artifact Registry remote repo (see README "Image
# pulls"); image_tag is applied to all four litellm-* images. Per-component
# *_image overrides are NOT exposed here — set them directly on the
# `module "litellm"` block in main.tf (see ../../variables.tf) if you need
# to mix-and-match versions.
# image_registry = "us-central1-docker.pkg.dev/my-gcp-project/litellm/berriai"
# image_tag = "v1.86.0-dev"
# ---------- proxy_config (mirrors helm gateway.config.proxy_config) ----------
# proxy_config = {
@ -75,3 +75,13 @@ env = "stage"
# OPENAI_API_KEY = "projects/my-gcp-project/secrets/openai-api-key"
# ANTHROPIC_API_KEY = "projects/my-gcp-project/secrets/anthropic-api-key"
# }
# ---------- OpenTelemetry v2 ----------
# OTel is gated on otel_endpoint: empty (default) and nothing is added to
# the container env; set it and both gateway and backend gain
# LITELLM_OTEL_V2=true plus the OTEL_* block (with OTEL_SERVICE_NAME
# stamped per component). These knobs aren't surfaced as wrapper vars in
# this example; set them directly on the `module "litellm"` block in
# main.tf (otel_endpoint, otel_exporter, otel_environment_name,
# otel_capture_message_content, otel_headers_secret). Full docs in
# ../../variables.tf.

View File

@ -0,0 +1,120 @@
# Curated surface for the one-command deploy path. The module (../../)
# exposes far more knobs (per-component CPU/memory/instances, Cloud SQL
# tier/edition, Memorystore tier, per-component image overrides, ). To
# tune those, set them directly on the `module "litellm"` block in
# main.tf, or call the module from your own root config. Full per-variable
# docs live in ../../variables.tf the module is the source of truth.
variable "project_id" {
description = "GCP project ID."
type = string
}
variable "region" {
description = "GCP region for VPC, Cloud SQL, Memorystore, Cloud Run, and the LB IP."
type = string
default = "us-central1"
}
variable "tenant" {
description = "Tenant slug — prefix for every resource (<tenant>-litellm-<env>)."
type = string
}
variable "env" {
description = "Environment suffix (stage, prod, dev)."
type = string
}
# Sensitive prefer TF_VAR_litellm_master_key / TF_VAR_litellm_license /
# TF_VAR_ui_password so values stay out of any committed tfvars file.
variable "litellm_master_key" {
description = "Pre-existing LITELLM_MASTER_KEY (sk-…). Empty → auto-generated."
type = string
default = ""
sensitive = true
}
variable "litellm_license" {
description = "LiteLLM enterprise license. Empty → OSS-only."
type = string
default = ""
sensitive = true
}
variable "ui_password" {
description = "UI admin password. Empty → falls back to LITELLM_MASTER_KEY."
type = string
default = ""
sensitive = true
}
# Image source. Cloud Run rejects ghcr.io, so a real deploy must point
# image_registry at an Artifact Registry remote repo (see README "Image
# pulls"). Per-component overrides live in ../../variables.tf.
variable "image_registry" {
description = "Registry path prefix; images composed as <image_registry>/litellm-<component>:<image_tag>."
type = string
default = "ghcr.io/berriai"
}
variable "image_tag" {
description = "Tag applied to all four litellm-* images. Bump in lockstep."
type = string
default = "v1.86.0-dev"
}
# TLS provide DNS names for a managed cert, or opt into HTTP-only for dev.
variable "lb_domains" {
description = "DNS names (already pointing at lb_ip) for a Google-managed cert. Empty → no TLS."
type = list(string)
default = []
}
variable "allow_plaintext_lb" {
description = "Opt into HTTP-only LB (trial/dev only)."
type = bool
default = false
}
variable "cloudsql_deletion_protection" {
description = "Cloud SQL deletion protection (writer + reader)."
type = bool
default = true
}
variable "gcs_force_destroy" {
description = "Allow destroy of a non-empty GCS bucket (ephemeral/CI only)."
type = bool
default = false
}
variable "proxy_config" {
description = "LiteLLM proxy config (contents of config.yaml). Empty → defaults."
type = any
default = {}
}
variable "gateway_extra_env" {
description = "Plain-text env vars layered onto the gateway."
type = map(string)
default = {}
}
variable "backend_extra_env" {
description = "Plain-text env vars layered onto the backend."
type = map(string)
default = {}
}
variable "gateway_extra_secrets" {
description = "Gateway env vars sourced from Secret Manager (name → secret resource ID)."
type = map(string)
default = {}
}
variable "backend_extra_secrets" {
description = "Backend env vars sourced from Secret Manager (name → secret resource ID)."
type = map(string)
default = {}
}

View File

@ -0,0 +1,18 @@
terraform {
required_version = ">= 1.6.0"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 6.10"
}
google-beta = {
source = "hashicorp/google-beta"
version = "~> 6.10"
}
random = {
source = "hashicorp/random"
version = "~> 3.6"
}
}
}

View File

@ -7,7 +7,7 @@ resource "random_id" "bucket_suffix" {
}
resource "google_storage_bucket" "this" {
name = "${var.project}-${local.name}-${random_id.bucket_suffix.hex}"
name = "${var.project_id}-${local.name}-${random_id.bucket_suffix.hex}"
location = var.region
uniform_bucket_level_access = true
force_destroy = var.gcs_force_destroy
@ -18,7 +18,7 @@ resource "google_storage_bucket" "this" {
public_access_prevention = "enforced"
labels = var.labels
labels = local.labels
}
# Cloud Run runtime SA gains object admin on this bucket only.
@ -27,3 +27,43 @@ resource "google_storage_bucket_iam_member" "runtime" {
role = "roles/storage.objectAdmin"
member = "serviceAccount:${google_service_account.runtime.email}"
}
# Dedicated bucket holding only config.yaml. Mounted read-only into the
# gateway and backend via Cloud Run v2's gcsfuse volume. Kept separate from
# the data-plane bucket above so the runtime SA can hold a narrower
# objectViewer binding here (config is read-only at runtime) while keeping
# objectAdmin on the data-plane bucket. Only created when proxy_config is
# non-empty.
resource "google_storage_bucket" "proxy_config" {
count = local.proxy_config_enabled ? 1 : 0
name = "${var.project_id}-${local.name}-config-${random_id.bucket_suffix.hex}"
location = var.region
uniform_bucket_level_access = true
force_destroy = var.gcs_force_destroy
versioning {
enabled = true
}
public_access_prevention = "enforced"
labels = local.labels
}
resource "google_storage_bucket_object" "proxy_config" {
count = local.proxy_config_enabled ? 1 : 0
name = local.proxy_config_file_name
bucket = google_storage_bucket.proxy_config[0].name
content = local.proxy_config_yaml
content_type = "application/yaml"
}
resource "google_storage_bucket_iam_member" "proxy_config_runtime" {
count = local.proxy_config_enabled ? 1 : 0
bucket = google_storage_bucket.proxy_config[0].name
role = "roles/storage.objectViewer"
member = "serviceAccount:${google_service_account.runtime.email}"
}

View File

@ -21,7 +21,7 @@ resource "google_service_account" "ui_runtime" {
# Cloud SQL client lets the Cloud Run services connect to the instance
# over private IP via the VPC connector.
resource "google_project_iam_member" "runtime_cloudsql" {
project = var.project
project = var.project_id
role = "roles/cloudsql.client"
member = "serviceAccount:${google_service_account.runtime.email}"
}
@ -69,3 +69,13 @@ resource "google_secret_manager_secret_iam_member" "extras" {
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.runtime.email}"
}
# OTEL_HEADERS secret accessor only created when var.otel_headers_secret
# is set. Carries the OTLP collector's auth header(s).
resource "google_secret_manager_secret_iam_member" "otel_headers" {
count = var.otel_headers_secret == "" ? 0 : 1
secret_id = var.otel_headers_secret
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.runtime.email}"
}

View File

@ -14,7 +14,8 @@ locals {
}
resource "google_compute_global_address" "lb" {
name = "${local.name}-lb-ip"
name = "${local.name}-lb-ip"
labels = local.labels
}
# Serverless NEGs one per Cloud Run service.
@ -148,6 +149,7 @@ resource "google_compute_global_forwarding_rule" "http" {
load_balancing_scheme = "EXTERNAL_MANAGED"
ip_address = google_compute_global_address.lb.address
target = google_compute_target_http_proxy.this.id
labels = local.labels
}
# ---------- HTTPS (gated on var.lb_domains) ----------
@ -160,11 +162,22 @@ resource "google_compute_global_forwarding_rule" "http" {
resource "google_compute_managed_ssl_certificate" "this" {
count = local.tls_enabled ? 1 : 0
name = "${local.name}-cert"
# A managed cert's `domains` is immutable, so changing var.lb_domains
# forces replacement, and the cert is referenced by the HTTPS target
# proxy a destroy-then-create replacement fails with
# `resourceInUseByAnotherResource`. Hashing the domains into the name
# makes the name change with the domain set, so create_before_destroy
# builds the new cert + repoints the proxy before deleting the old one.
name = "${local.name}-cert-${substr(sha1(join(",", var.lb_domains)), 0, 8)}"
managed {
domains = var.lb_domains
}
lifecycle {
create_before_destroy = true
}
}
resource "google_compute_target_https_proxy" "this" {
@ -182,4 +195,5 @@ resource "google_compute_global_forwarding_rule" "https" {
load_balancing_scheme = "EXTERNAL_MANAGED"
ip_address = google_compute_global_address.lb.address
target = google_compute_target_https_proxy.this[0].id
labels = local.labels
}

View File

@ -8,6 +8,19 @@ locals {
# the stack can reference local.name.
name = "${var.tenant}-litellm-${var.env}"
# Mirrors the AWS stack's local.tags: the module stamps its own
# `litellm-stack` / `managed-by` labels onto every label-supporting
# resource (Cloud Run, Cloud SQL, Memorystore, Secret Manager, GCS) and
# merges var.labels on top. GCP label keys/values are lower-kebab/snake
# only, so the key is `litellm-stack`, not AWS's `litellm:stack`.
labels = merge(
{
"litellm-stack" = local.name
"managed-by" = "terraform"
},
var.labels,
)
gateway_path_prefixes = [
"/v1/chat/*", "/chat/*",
"/v1/completions*", "/completions*",
@ -62,11 +75,18 @@ locals {
]
proxy_config_enabled = length(keys(var.proxy_config)) > 0
proxy_config_b64 = local.proxy_config_enabled ? base64encode(yamlencode(var.proxy_config)) : ""
proxy_config_yaml = local.proxy_config_enabled ? yamlencode(var.proxy_config) : ""
proxy_config_mount_path = "/etc/litellm"
proxy_config_file_name = "config.yaml"
proxy_config_volume = "proxy-config"
proxy_config_env = local.proxy_config_enabled ? [
{ name = "LITELLM_PROXY_CONFIG_B64", value = local.proxy_config_b64 },
{ name = "CONFIG_FILE_PATH", value = "/tmp/litellm-config.yaml" },
{ name = "CONFIG_FILE_PATH", value = "${local.proxy_config_mount_path}/${local.proxy_config_file_name}" },
# Forces a new Cloud Run revision when the YAML changes; gcsfuse only
# surfaces the new object on container restart, so without this an
# updated proxy_config would sit in the bucket unread.
{ name = "PROXY_CONFIG_HASH", value = md5(local.proxy_config_yaml) },
] : []
# Resolved image URIs: per-component override wins, otherwise compose

View File

@ -59,6 +59,6 @@ output "migration_run_command" {
"gcloud run jobs execute %s --region %s --project %s --wait",
google_cloud_run_v2_job.migrations.name,
var.region,
var.project,
var.project_id,
)
}

View File

@ -1,9 +0,0 @@
provider "google" {
project = var.project
region = var.region
}
provider "google-beta" {
project = var.project
region = var.region
}

View File

@ -9,6 +9,8 @@ resource "google_redis_instance" "this" {
redis_version = "REDIS_7_0"
labels = local.labels
# In-transit encryption between Cloud Run and Memorystore. The instance
# exposes its self-signed CA via `server_ca_certs` (read in cloudrun.tf
# and passed to the proxy as REDIS_CA_PEM_B64); the proxy decodes it to

View File

@ -10,6 +10,7 @@ resource "random_password" "master_key" {
# account gets accessor permission on it (see iam.tf).
resource "google_secret_manager_secret" "master_key" {
secret_id = "${local.name}-master-key"
labels = local.labels
replication {
auto {}
}
@ -29,6 +30,7 @@ resource "google_secret_manager_secret" "license" {
count = var.litellm_license == "" ? 0 : 1
secret_id = "${local.name}-license"
labels = local.labels
replication {
auto {}
}
@ -49,6 +51,7 @@ resource "google_secret_manager_secret" "ui_password" {
count = var.ui_password == "" ? 0 : 1
secret_id = "${local.name}-ui-password"
labels = local.labels
replication {
auto {}
}

View File

@ -1,4 +1,4 @@
variable "project" {
variable "project_id" {
description = "GCP project ID."
type = string
}
@ -30,11 +30,9 @@ variable "env" {
}
variable "labels" {
description = "Resource labels merged into every label-supporting resource."
description = "Per-deployment labels applied to every label-supporting resource the module creates, on top of the module's own `litellm-stack` / `managed-by` labels. Mirrors the AWS stack's `tags` input."
type = map(string)
default = {
"managed-by" = "terraform"
}
default = {}
}
# ---------- Tenant-supplied secrets ----------
@ -171,6 +169,17 @@ variable "gateway_memory" {
default = "4Gi"
}
variable "gateway_num_workers" {
description = "uvicorn worker processes per gateway instance (passed as --workers). Size relative to gateway_cpu — uvicorn recommends ~(2 × vCPU) + 1 for CPU-bound work. Mirrors the AWS stack's gateway_num_workers."
type = number
default = 1
validation {
condition = var.gateway_num_workers >= 1
error_message = "gateway_num_workers must be >= 1."
}
}
# Cloud Run autoscales out of the box (request-rate driven). The min/max
# bounds mirror the HPA replica bounds in helm/litellm/values.yaml so each
# stack scales over the same range. Cloud Run has no direct CPU-utilization
@ -394,12 +403,90 @@ variable "backend_extra_secrets" {
variable "proxy_config" {
description = <<-EOT
LiteLLM proxy config (contents of config.yaml). Mirrors the helm chart's
`gateway.config.proxy_config`. Passed to gateway, backend, and the
migration job as a base64-encoded env var and decoded to
/tmp/litellm-config.yaml at container start; CONFIG_FILE_PATH is set
automatically. Reference env-injected secrets from the YAML via
`os.environ/<NAME>`. Leave empty ({}) to skip.
`gateway.config.proxy_config`. YAML-encoded and uploaded to a dedicated
GCS bucket as `config.yaml`, then mounted read-only into the gateway
and backend at `/etc/litellm` via Cloud Run v2's gcsfuse volume;
CONFIG_FILE_PATH is set automatically. A hash of the YAML is wired in
as an env var so a config-only edit forces a new revision (gcsfuse
surfaces the new object on container restart). Reference env-injected
secrets from the YAML via `os.environ/<NAME>`. Leave empty ({}) to
skip the bucket isn't created and no volume is mounted.
EOT
type = any
default = {}
}
# ---------- OpenTelemetry v2 ----------
#
# https://docs.litellm.ai/docs/observability/opentelemetry_v2
#
# OTel v2 is opt-in and gated entirely on otel_endpoint, matching the AWS
# stack. Leave otel_endpoint = "" and nothing OTel-related is added to the
# container env. Set it and the gateway/backend gain LITELLM_OTEL_V2=true
# plus the OTEL_* block (per-component OTEL_SERVICE_NAME, exporter, endpoint,
# environment name, capture-content), with OTEL_HEADERS sourced from
# otel_headers_secret when provided.
variable "otel_endpoint" {
description = <<-EOT
OTLP collector URL (e.g. https://otel.example.com:4318 for HTTP, or
your collector's :4317 for gRPC). Empty disables OTel entirely (no
LITELLM_OTEL_V2, no OTEL_* env). When set, LITELLM_OTEL_V2=true plus
OTEL_EXPORTER / OTEL_ENDPOINT are injected and spans ship to the
collector.
EOT
type = string
default = ""
}
variable "otel_exporter" {
description = <<-EOT
OTel exporter protocol. Ignored when otel_endpoint is empty. `otlp_http`
is the safer default (works through a vanilla L7 ingress); `otlp_grpc`
needs the collector reachable over h2 and the `grpcio` extra installed
in the proxy image.
EOT
type = string
default = "otlp_http"
validation {
condition = contains(["otlp_http", "otlp_grpc", "console"], var.otel_exporter)
error_message = "otel_exporter must be one of: otlp_http, otlp_grpc, console."
}
}
variable "otel_headers_secret" {
description = <<-EOT
Optional Secret Manager secret resource ID
(`projects/<project>/secrets/<name>`) whose latest version is the
value of OTEL_HEADERS used for collector auth, e.g.
`Authorization=Bearer <token>`. Mounted as an env-var secret_key_ref;
the runtime SA auto-gains roles/secretmanager.secretAccessor.
EOT
type = string
default = ""
}
variable "otel_environment_name" {
description = <<-EOT
Value for OTEL_ENVIRONMENT_NAME (becomes `deployment.environment` on
every span). Defaults to var.env so spans land tagged with the
deployment env without extra wiring.
EOT
type = string
default = ""
}
variable "otel_capture_message_content" {
description = <<-EOT
Value for OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT. Default
`no_content` matches the litellm default; flip to `prompt_and_completion`
only when you've audited what's about to land in your observability
backend, because raw prompts/completions are typically sensitive.
EOT
type = string
default = "no_content"
validation {
condition = contains(["no_content", "prompt_and_completion"], var.otel_capture_message_content)
error_message = "otel_capture_message_content must be one of: no_content, prompt_and_completion."
}
}