fix(iac-workflow): make S3-compatible remote state mandatory (no local fallback)

Previously 'Configure remote backend' had `if: TF_STATE_BUCKET != ''`, so when
the gate evaluated empty the step was skipped and terraform silently fell back to
local state — risking state loss on destroy. TF_STATE_* exist in Vault, so make
the remote backend the default required path:

- Validate step now requires TF_STATE_{ENDPOINT,BUCKET,ACCESS_KEY,SECRET_KEY}
- 'Configure remote backend' always runs (renders backend.tf)
- terraform init fails fast if TF_STATE_BUCKET empty (removed local-state else)
- header comment updated: backend keys are required, not optional

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Haitao Pan 2026-06-25 22:09:44 +08:00
parent 5ce6dad9bc
commit 09a8bae35d

View File

@ -24,12 +24,13 @@ name: Deploy AI Workspace (IaC + Ansible + Cloudflare)
# NVIDIA_API_KEY → 同上
# OLLAMA_API_KEY → 同上
#
# 4. Vault KV 可选键(不填则本地 state生产建议填防止 destroy 丢 state
# 4. Vault KV 必填键(远端 S3 兼容 state 后端强制启用,缺失即 fail-fast
# 不再回退本地 state确保 destroy 不丢 state
# kv/CICD:
# TF_STATE_ENDPOINT → S3 兼容对象存储 API URL如 https://ewr1.vultrobjects.com
# TF_STATE_ENDPOINT → S3 兼容对象存储 API URL如 https://<acct>.r2.cloudflarestorage.com
# TF_STATE_BUCKET → bucket 名(如 ai-workspace-tfstate
# TF_STATE_ACCESS_KEY / TF_STATE_SECRET_KEY → 对象存储凭据
# TF_STATE_REGION → 地域(Vultr 填 us-east-1Cloudflare R2 必须填 auto
# TF_STATE_REGION → 地域(Cloudflare R2 必须填 autoVultr 填 us-east-1
# → 对象存储搭建指南见 docs/operations/iac-prerequisites.md §3
#
# 5. ai-workspace-infra 私有仓库(可选加速)
@ -142,15 +143,25 @@ jobs:
- name: Validate required secrets
env:
VULTR_API_KEY: ${{ steps.vault.outputs.VULTR_API_KEY }}
TF_STATE_ENDPOINT: ${{ steps.vault.outputs.TF_STATE_ENDPOINT }}
TF_STATE_BUCKET: ${{ steps.vault.outputs.TF_STATE_BUCKET }}
TF_STATE_ACCESS_KEY: ${{ steps.vault.outputs.TF_STATE_ACCESS_KEY }}
TF_STATE_SECRET_KEY: ${{ steps.vault.outputs.TF_STATE_SECRET_KEY }}
run: |
set -euo pipefail
# 只校验 REQUIRED 机密非空(不打印任何值,仅判空);可选键
# (INFRA_REPO_TOKEN / TF_STATE_*) 不在此校验。
# 校验 REQUIRED 机密非空(不打印任何值,仅判空)
# 远端 S3 兼容 state 后端为强制要求(默认开启,不再回退本地 state
missing=0
if [ -z "${VULTR_API_KEY:-}" ]; then
echo "::error::缺少必需机密 VULTR_API_KEY (Vault: ${VAULT_KV}/VULTR_API_KEY)"
missing=1
fi
for k in TF_STATE_ENDPOINT TF_STATE_BUCKET TF_STATE_ACCESS_KEY TF_STATE_SECRET_KEY; do
if [ -z "$(eval echo \"\${$k:-}\")" ]; then
echo "::error::缺少必需机密 $k (Vault: ${VAULT_KV}/$k) —— 远端 S3 state 后端为强制要求"
missing=1
fi
done
[ "$missing" -eq 0 ] || { echo "::error::必需机密缺失,终止 provision"; exit 1; }
- name: Checkout iac_modules
@ -178,8 +189,7 @@ jobs:
- name: Install render deps
run: pip install --quiet pyyaml jinja2
- name: Configure remote backend (optional)
if: ${{ steps.vault.outputs.TF_STATE_BUCKET != '' }}
- name: Configure remote backend (S3-compatible, required)
working-directory: ${{ env.ENV_DIR }}
env:
TF_STATE_ENDPOINT: ${{ steps.vault.outputs.TF_STATE_ENDPOINT }}
@ -199,15 +209,16 @@ jobs:
TF_STATE_REGION: ${{ steps.vault.outputs.TF_STATE_REGION }}
run: |
set -euo pipefail
if [ -n "${TF_STATE_BUCKET}" ]; then
terraform init -input=false \
-backend-config="bucket=${TF_STATE_BUCKET}" \
-backend-config="key=ai-workspace/terraform.tfstate" \
-backend-config="region=${TF_STATE_REGION:-auto}"
else
echo "::warning::未配置远端 state(Vault 无 TF_STATE_BUCKET),使用本地 state仅适合一次性演示destroy 需同一次运行)"
terraform init -input=false
# 远端 S3 兼容 state 后端强制启用backend.tf 已由上一步渲染);
# 缺失 bucket 直接失败,不回退本地 state。
if [ -z "${TF_STATE_BUCKET}" ]; then
echo "::error::TF_STATE_BUCKET 为空 —— 远端 state 后端为强制要求,终止"
exit 1
fi
terraform init -input=false \
-backend-config="bucket=${TF_STATE_BUCKET}" \
-backend-config="key=ai-workspace/terraform.tfstate" \
-backend-config="region=${TF_STATE_REGION:-auto}"
- name: Terraform ${{ github.event.inputs.terraform_action || 'apply' }}
working-directory: ${{ env.ENV_DIR }}