ci+docs(vault): SSH key B64-preferred pattern + xworkspace-console Vault setup

- deploy job: read ANSIBLE_SSH_KEY_B64 (preferred) + ANSIBLE_SSH_KEY (fallback)
  from Vault, decode/write ~/.ssh/id_deploy and ssh-keygen -y self-check —
  matches the org SSH-deploy runbook (avoids multiline-key libcrypto errors).
- docs/operations/vault-github-actions.md: full Vault role/policy/jwt/KV setup
  for github-actions-xworkspace-console, mirroring the existing org records.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Haitao Pan 2026-06-24 15:21:01 +08:00
parent 75d3098d1c
commit 04d349073e
2 changed files with 125 additions and 4 deletions

View File

@ -217,8 +217,10 @@ jobs:
method: jwt
role: ${{ env.VAULT_ROLE }}
jwtGithubAudience: vault
ignoreNotFound: true
secrets: |
${{ env.VAULT_KV }} ANSIBLE_SSH_KEY | ANSIBLE_SSH_KEY ;
${{ env.VAULT_KV }} ANSIBLE_SSH_KEY_B64 | ANSIBLE_SSH_KEY_B64 ;
${{ env.VAULT_KV }} DEEPSEEK_API_KEY | DEEPSEEK_API_KEY ;
${{ env.VAULT_KV }} NVIDIA_API_KEY | NVIDIA_API_KEY ;
${{ env.VAULT_KV }} OLLAMA_API_KEY | OLLAMA_API_KEY
@ -229,12 +231,24 @@ jobs:
name: ai-workspace-cmdb
path: cmdb
- name: Configure SSH
- name: Configure SSH (prefer base64 key, fall back to raw)
env:
ANSIBLE_SSH_KEY: ${{ steps.vault.outputs.ANSIBLE_SSH_KEY }}
ANSIBLE_SSH_KEY_B64: ${{ steps.vault.outputs.ANSIBLE_SSH_KEY_B64 }}
run: |
set -euo pipefail
mkdir -p ~/.ssh
printf '%s\n' "${{ steps.vault.outputs.ANSIBLE_SSH_KEY }}" > ~/.ssh/id_ed25519
chmod 600 ~/.ssh/id_ed25519
# 历史约定:优先解码单行 *_B64再回退原始多行私钥避免 GitHub Actions
# 处理多行私钥时的 "Load key ... error in libcrypto"。
if [ -n "${ANSIBLE_SSH_KEY_B64:-}" ]; then
printf '%s' "${ANSIBLE_SSH_KEY_B64}" | base64 -d > ~/.ssh/id_deploy
elif [ -n "${ANSIBLE_SSH_KEY:-}" ]; then
printf '%s\n' "${ANSIBLE_SSH_KEY}" > ~/.ssh/id_deploy
else
echo "::error::Vault 未提供 ANSIBLE_SSH_KEY[_B64]"; exit 1
fi
chmod 600 ~/.ssh/id_deploy
ssh-keygen -y -f ~/.ssh/id_deploy >/dev/null
- name: Wait for host SSH
run: |
@ -257,7 +271,7 @@ jobs:
ip="$(jq -r '.["${{ matrix.host }}"].ip' cmdb/cmdb.json)"
user="$(jq -r '.["${{ matrix.host }}"].ansible_user // "root"' cmdb/cmdb.json)"
echo "Bootstrapping ${{ matrix.host }} (${user}@${ip}) on-host ..."
ssh -i ~/.ssh/id_ed25519 \
ssh -i ~/.ssh/id_deploy \
-o StrictHostKeyChecking=accept-new \
-o ServerAliveInterval=20 -o ServerAliveCountMax=15 \
-o ConnectTimeout=20 \

View File

@ -0,0 +1,107 @@
# Vault + GitHub Actions 配置xworkspace-console
本文档记录 `xworkspace-console` 仓库的 GitHub Actions 经 HashiCorp Vault
(https://vault.svc.plus) OIDC 登录、按仓库隔离读取 KV 密钥的配置。
只记录流程、路径、字段名与配置原则,**不包含任何敏感值**。
延续既有统一模式(见 ai-workspace 体系的 `vault-github-actions-2026-06-06.md`
`vault-github-actions-ssh-deploy-runbook.md`),新增仓库只需补:一条 policy、
一条 role、对应 `kv/data/github-actions/<repo>` 路径、workflow 中的 vault-action 步骤。
## 1. 全局前提(已存在,无需重建)
- `jwt` auth mountUI: Access → Authentication Methods
- Type `jwt`Path `jwt/`Accessor 如 `auth_jwt_6fd8b418`
- `oidc_discovery_url = https://token.actions.githubusercontent.com`
- `bound_issuer = https://token.actions.githubusercontent.com`
- Default/Max Lease TTL1 month 1 day沿用现状
## 2. 本仓库专属 policy + role
统一命名role/policy = `github-actions-xworkspace-console`KV 读路径
`kv/data/github-actions/xworkspace-console`
```bash
# 2.1 policy仅允许读本仓库 KV 路径
vault policy write github-actions-xworkspace-console - <<'EOF'
path "kv/data/github-actions/xworkspace-console" {
capabilities = ["read"]
}
path "kv/metadata/github-actions/xworkspace-console" {
capabilities = ["read", "list"]
}
EOF
# 2.2 role仅绑定本仓库的 GitHub OIDC 身份
vault write auth/jwt/role/github-actions-xworkspace-console \
role_type="jwt" \
user_claim="repository" \
bound_audiences="vault" \
bound_claims_type="glob" \
bound_claims='{"repository":"ai-workspace-lab/xworkspace-console","sub":"repo:ai-workspace-lab/xworkspace-console:*"}' \
token_policies="github-actions-xworkspace-console" \
token_ttl="20m" \
token_max_ttl="30m"
```
> 权限模型与其它仓库一致:`repository = ai-workspace-lab/xworkspace-console`、
> `sub = repo:ai-workspace-lab/xworkspace-console:*`、`bound_audiences=["vault"]`
> policy 仅读自己的 KV。如需仅限某分支可把 `sub` 收窄为
> `repo:ai-workspace-lab/xworkspace-console:ref:refs/heads/main`
## 3. KV 字段(`kv/data/github-actions/xworkspace-console`
deploy-ai-workspace-iac.yaml 各 job 按需读取:
| 字段 | 用途 | 必填 |
| --- | --- | --- |
| `VULTR_API_KEY` | provision`TF_VAR_vultr_api_key` | 是 |
| `INFRA_REPO_TOKEN` | checkout 私有 `ai-workspace-infra` 的 PAT | 私有仓库时是 |
| `ANSIBLE_SSH_KEY` | 连主机的 SSH 私钥(原始多行) | 二选一 |
| `ANSIBLE_SSH_KEY_B64` | 同上的 base64 单行(**优先**,避免多行私钥 libcrypto 报错) | 二选一 |
| `CLOUDFLARE_API_TOKEN` | dnsCloudflare DNS 编辑 token | 同步 DNS 时是 |
| `DEEPSEEK_API_KEY` / `NVIDIA_API_KEY` / `OLLAMA_API_KEY` | deploy注入主机的 LLM provider keys | 是 |
| `TF_STATE_ENDPOINT` / `TF_STATE_BUCKET` / `TF_STATE_ACCESS_KEY` / `TF_STATE_SECRET_KEY` / `TF_STATE_REGION` | provision远端 S3 兼容 TF state不配则用本地 state | 否 |
```bash
# 写入示例(敏感值请勿入库/勿贴日志SSH key 同时存原始与 B64
vault kv put kv/github-actions/xworkspace-console \
VULTR_API_KEY=... \
INFRA_REPO_TOKEN=... \
CLOUDFLARE_API_TOKEN=... \
DEEPSEEK_API_KEY=... NVIDIA_API_KEY=... OLLAMA_API_KEY=... \
ANSIBLE_SSH_KEY=@/path/to/id_deploy \
ANSIBLE_SSH_KEY_B64="$(base64 -w0 < /path/to/id_deploy)"
```
> SSH 私钥须与 `vultr-vps` 资源声明 `config/resources/ai-workspace-hosts.yaml`
> 的 `ssh_keys[].public` 配对。
## 4. workflow 接入方式(已落地)
`.github/workflows/deploy-ai-workspace-iac.yaml`
1. `permissions.id-token: write`+ `contents: read`
2. 每个 job 用 `hashicorp/vault-action@v2``method: jwt`、`role:
github-actions-xworkspace-console`、`jwtGithubAudience: vault`、
`kv/data/github-actions/xworkspace-console` 读所需字段(可选字段配
`ignoreNotFound: true`
3. 各步骤用 `steps.vault.outputs.<KEY>`,不再使用 GitHub Actions Secrets
4. SSH 落盘优先解码 `ANSIBLE_SSH_KEY_B64`、回退 `ANSIBLE_SSH_KEY`,并
`ssh-keygen -y -f` 自检
5. 远端 TF state 的开关由 `steps.vault.outputs.TF_STATE_BUCKET` 是否非空决定
## 5. 验收步骤
1. 触发 `Deploy AI Workspace (IaC + Ansible + Cloudflare)`workflow_dispatch
2. 确认每个 job 的 `Load Vault secrets (OIDC)` 成功(读到本仓库 KV
3. provision`Terraform apply` 成功、产出 `cmdb.json` 矩阵。
4. deploy`Configure SSH` 成功B64 优先)、`Run on-host bootstrap` 成功。
5. dns`Reconcile Cloudflare DNS` 成功。
## 6. 故障处理
- `vault-action``valid path and key`:检查 `secrets` 每行用 `;` 分隔、KV 路径含 `data/`
- `permission denied` / role 不匹配:核对 OIDC `sub/repository` 与 role 的 `bound_claims`
- `Load key ... error in libcrypto`:确认 workflow 读取并优先解码了 `ANSIBLE_SSH_KEY_B64`
- `Permission denied (publickey)`:本地用同一私钥先 SSH 验证,再更新 Vault确认与 hosts.yaml 公钥配对。