Commit Graph

203 Commits

Author SHA1 Message Date
bfbe038ab2
Release/v1.1.5 (#6)
* ci: backport release/* source validation workflow to release/v1.1.5 (#3)

让现有 release/v1.1.5 分支自身包含门禁 workflow(pull_request_target 用 base 分支版本)。
详见 iac_modules/docs/tldr-github-branch-model.md

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

* backport: support customizable AI_WORKSPACE_AUTH_TOKEN in deployment workflow

* ci: support customizable AI_WORKSPACE_AUTH_TOKEN in deployment workflow (#5)

- Add AI_WORKSPACE_AUTH_TOKEN to Vault KV secret reads (provision + deploy jobs)
- Add ai_workspace_auth_token as optional workflow_dispatch input parameter
- Allow runtime override of auth token (input takes precedence over Vault)
- Include TLDR token generation guidance in workflow description
- Wire token through all-in-one bootstrap with precedence: input > Vault

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 16:34:21 +08:00
b29b85025b
ci: support customizable AI_WORKSPACE_AUTH_TOKEN in deployment workflow (#4)
* ci: backport release/* source validation workflow to release/v1.1.5

让现有 release/v1.1.5 分支自身包含门禁 workflow(pull_request_target 用 base 分支版本)。
详见 iac_modules/docs/tldr-github-branch-model.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ci: support customizable AI_WORKSPACE_AUTH_TOKEN in deployment workflow

- Add AI_WORKSPACE_AUTH_TOKEN to Vault KV secret reads (provision + deploy jobs)
- Add ai_workspace_auth_token as optional workflow_dispatch input parameter
- Allow runtime override of auth token (input takes precedence over Vault)
- Include TLDR token generation guidance in workflow description
- Wire token through all-in-one bootstrap with precedence: input > Vault

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 16:12:49 +08:00
6eb16afb14
ci: add release/* branch source validation workflow (#2)
release/* 仅接受 hotfix/* 或带 cherry-pick/backport 标签的 PR。
详见 iac_modules/docs/tldr-github-branch-model.md

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 12:12:21 +08:00
3ce3c6fb66 fix(iac): require Cloudflare DNS token 2026-06-27 13:48:20 +08:00
2d3289fbc5 fix(installer): resolve local macOS patcher after cwd changes 2026-06-27 09:02:08 +08:00
5093e21e35 fix(installer): use checked-in macOS patcher locally 2026-06-27 08:58:37 +08:00
50c2d85a14 fix(installer): keep macOS OpenClaw plugin on stable path 2026-06-27 08:56:52 +08:00
974904be13 ci: update workflow actions for node 24 2026-06-26 19:05:39 +08:00
338d057375 feat(ci): add provider key wiring toggles 2026-06-26 18:30:29 +08:00
50070c0708 fix(ci): pass tfstate credentials to inventory render 2026-06-26 18:15:35 +08:00
12b5805fb5 fix(ci): pass tfstate credentials to terraform apply 2026-06-26 18:12:21 +08:00
002257ce5b fix(ci): source tf state region from vault 2026-06-26 18:10:28 +08:00
3b270f4959 fix(ci): pin aws tfstate region for s3 backend 2026-06-26 18:07:52 +08:00
8f8e925706 fix(ci): require tf state region from vault 2026-06-26 17:50:04 +08:00
a72e580ae6 fix(ci): default tf state region to us-east-1 2026-06-26 17:47:49 +08:00
26a4794f2f docs(verify): record clean green IaC↔Ansible run + nodejs/resolver fixes
Both hosts reached RC=0 on a single on-host curl|bash bootstrap; console 17000=200,
api 8788 up, litellm 4000=200 "I'm alive!" (incl. ubuntu26 uv-Py3.13), caddy active;
FQDN hostnames set; VPS destroyed, instances=0. Adds fixes #12 (nodejs self-ref
recursion / omit-sentinel leak) and #13 (browser resolver skips disabled chromium stub).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 11:15:24 +08:00
029ef9fc13 chore(mcp): add local MCP debug tooling (github/terraform/ssh servers)
Local MCP debug setup: launcher scripts, config, setup script, and EN/ZH docs.
Secrets live in config/mcp/local-mcp.env (gitignored); commit a sanitized
local-mcp.env.example template instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 22:56:43 +08:00
5a76c5ed06 fix(deploy): on-host bootstrap defaults to online mode (pull fixed main playbooks)
The deploy job ran curl|bash with no AI_WORKSPACE_OFFLINE_MODE -> auto -> stale
offline package, which still ships the pinned-Chrome / root-PGDATA playbooks that
were already fixed in playbooks main. Pipeline kept failing at the Chrome task.

- run-on-host-bootstrap.sh: thread AI_WORKSPACE_OFFLINE_MODE (default off) into the
  remote env so the bootstrap git-clones latest main instead of the stale package.
- workflow: add offline_mode input (off|auto|force, default off); flip back to auto
  once the offline package is republished with the fixes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 22:47:14 +08:00
09a8bae35d fix(iac-workflow): make S3-compatible remote state mandatory (no local fallback)
Previously 'Configure remote backend' had `if: TF_STATE_BUCKET != ''`, so when
the gate evaluated empty the step was skipped and terraform silently fell back to
local state — risking state loss on destroy. TF_STATE_* exist in Vault, so make
the remote backend the default required path:

- Validate step now requires TF_STATE_{ENDPOINT,BUCKET,ACCESS_KEY,SECRET_KEY}
- 'Configure remote backend' always runs (renders backend.tf)
- terraform init fails fast if TF_STATE_BUCKET empty (removed local-state else)
- header comment updated: backend keys are required, not optional

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 22:09:44 +08:00
5ce6dad9bc fix(iac-workflow): change TF_STATE_REGION fallback from us-east-1 to auto
Cloudflare R2 S3-compatible backend requires region=auto; the previous
fallback us-east-1 would cause terraform init to fail if Vault key is absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 21:01:13 +08:00
e39b16e92f fix(ci): checkout bootstrap helper in deploy job 2026-06-25 20:53:33 +08:00
fbfa32ca2a fix(ci): poll on-host bootstrap logs across ssh reconnects 2026-06-25 20:48:20 +08:00
cd630c45d5 fix(bootstrap): allow online fallback after offline installer failure 2026-06-25 19:57:46 +08:00
12d9bb327f refactor(ci): 将 render_backend_tf.py 移至 ai-workspace-infra vultr-vps/scripts/
脚本从 xworkspace-console/scripts/ 移入 ai-workspace-infra 的
vultr-vps/scripts/,通过已有的 Checkout iac_modules 步骤引用,
无需额外 self-checkout xw-console;workflow 和 CLAUDE.md 同步更新路径。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 12:02:48 +08:00
9b3687e189 fix(ci): 消除 workflow 所有 heredoc,改为外置脚本调用
- 删除 Configure remote backend 步骤的 shell heredoc(导致 YAML L191 语法错误)
- 新增 scripts/render_backend_tf.py 外置脚本,接受 TF_STATE_ENDPOINT env 渲染 backend.tf
- provision job 新增 Checkout xworkspace-console 步骤,确保 scripts/ 在 runner 可用
- 新增 CLAUDE.md,明确禁止 workflow 内嵌 heredoc(shell/python),要求外置脚本

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 11:48:47 +08:00
f636366699 fix(ci): 还原 backend.tf 为 shell heredoc,修复 Jinja2 内联 Python 导致的 YAML 语法错误
Python 内联脚本(python3 - <<'PYEOF'...PYEOF)的代码行从列 1 开始,
超出 YAML literal block 的缩进范围,导致整个 workflow 文件 YAML 解析失败,
GitHub 丢失 workflow_dispatch 触发器。

还原为 shell heredoc(<<TFEOF,非引号,允许变量展开),
保留 force_path_style → use_path_style 升级。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 11:45:44 +08:00
4a6057d58b fix(ci): 改用 Jinja2 渲染 backend.tf + 更新 force_path_style → use_path_style
- 将 Configure remote backend 步骤从 shell heredoc 改为 Python Jinja2 渲染,
  避免 shell 引号/转义问题,与 generate.py 保持一致的渲染风格
- force_path_style 已在 Terraform 1.9 废弃,改为 use_path_style

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 11:37:25 +08:00
b4c051e6c0 fix(ci): 将 R2 endpoint 写入 backend.tf HCL 而非 -backend-config flag
Terraform S3 backend 的 endpoints 块只能在 HCL 配置里指定,
无法通过 -backend-config 命令行参数传递(endpoints={s3=...} 和
endpoints.s3=... 两种写法均被 Terraform 拒绝)。

改为:Configure remote backend 步骤用非引号 heredoc 将
TF_STATE_ENDPOINT 展开写入 backend.tf,terraform init 只通过
-backend-config 传 bucket/key/region。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 11:29:35 +08:00
b9ec7a2e45 fix(ci): 修复 R2 TF state backend endpoint 语法 + 补全前置条件文档
- 将 terraform init -backend-config 中的 endpoints={s3="..."} HCL map
  语法改为 endpoints.s3=... 点号语法(前者在 -backend-config flag 中无效,
  导致 R2 endpoint 未被传递,Terraform 回退 AWS 默认 endpoint 签名失败)
- 补全 workflow 顶部 TLDR 前置条件注释(6 项)
- 新增 docs/operations/iac-prerequisites.md(前置条件完整指南含 R2 搭建)
- vault-github-actions.md 补充 §7 交叉引用

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 11:13:16 +08:00
d225ff74e2 fix(ci): fix terraform s3 backend SignatureDoesNotMatch error in dynamically generated backend config
- Add skip_s3_checksum = true and skip_metadata_api_check = true to s3 backend config
- Use endpoints = { s3 = ... } instead of deprecated endpoint parameter in terraform init
2026-06-25 10:46:01 +08:00
4b1f809937 ci: checkout playbooks and iac_modules from public repos
- Stop checking out the old private mono-repo `ai-workspace-infra`.
- Checkout the split public repositories `ai-workspace-infra/playbooks` and `ai-workspace-infra/iac_modules` separately.
- Remove `CODEX_GITHUB_PERSONAL_ACCESS_TOKEN` (`INFRA_REPO_TOKEN`) dependency from vault as it's no longer needed for public repos.
2026-06-25 10:14:15 +08:00
4231afc399 docs: refine latest verification (FQDN hostname both, litellm up on debian13, remaining items)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 22:00:03 +08:00
6df0990014 docs(operations): record acp-retry/litellm-uv/FQDN/non-empty fixes + verification status
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 21:58:21 +08:00
d3356a0ef0 docs(operations): end-to-end IaC<->Ansible dynamic-inventory workflow
Documents the YAML->generate.py->terraform->cmdb.json->ansible flow, the FQDN
inventory_hostname contract, the two execution models, the Vault-OIDC pipeline,
the non-empty/fail-fast checks, and the key fixes that make it work end to end.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 20:51:54 +08:00
c2cd3035a4 ci(deploy-iac): fail fast on missing required Vault secrets
Add a 'Validate required secrets' run-step after each job's Vault OIDC
load step. It checks REQUIRED steps.vault.outputs.* are non-empty via
env: mapping (never echoes secret values), and on any empty key prints a
::error:: naming the key + its Vault path then exit 1. The deploy job
requires at least one of ANSIBLE_SSH_KEY_B64 / ANSIBLE_SSH_KEY. Optional
keys (INFRA_REPO_TOKEN, TF_STATE_*) are not validated. Vault path strings
in error messages reference the env.VAULT_KV[_OPENCLAW] vars rather than
hardcoded literals.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 20:46:30 +08:00
fa808eae80 fix(bootstrap): inventory_hostname from FQDN, not hardcoded 127.0.0.1
On-host ansible-playbook -c local now uses XWORKMATE_BRIDGE_DOMAIN (sourced from
CMDB service_domains via the pipeline) or the host FQDN as inventory_hostname,
falling back to 127.0.0.1 only when no valid FQDN exists. Keeps -c local.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 20:42:27 +08:00
fe479bc4b4 ci(deploy-iac): pass XWORKMATE_BRIDGE_DOMAIN (override or CMDB service_domains) to on-host bootstrap
New optional 'bridge_domain' input overrides; otherwise derive from each host's
cmdb.json host_vars.service_domains (first entry) and inject as
XWORKMATE_BRIDGE_DOMAIN so the host sets /etc/hostname + xworkmate-bridge.caddy
from it (on-host model has no inventory hostvars).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 15:56:45 +08:00
607c995a9a ci+docs(vault): read LLM keys from kv/openclaw, SSH/infra/cloudflare from kv/CICD
DEEPSEEK/NVIDIA/OLLAMA_API_KEY live in kv/data/openclaw (not CICD); vault-action
reads them from that path in the same step. Policy grants read on both
kv/data/CICD and kv/data/openclaw.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 15:35:25 +08:00
dba85dad04 docs(ci): fix header comment to kv/CICD + actual key names
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 15:31:30 +08:00
5d852e0285 ci+docs(vault): read shared kv/CICD with existing key names
- VAULT_KV -> kv/data/CICD (shared CICD secrets), map existing keys to outputs:
  CODEX_GITHUB_PERSONAL_ACCESS_TOKEN->INFRA_REPO_TOKEN,
  SSH_PRIVATE_DEPLOY_KEY[_B64]->ANSIBLE_SSH_KEY[_B64],
  CLOUDFLARE_DNS_API_TOKEN direct; VULTR_API_KEY/LLM keys same name.
- docs: policy reads kv/data/CICD; field table maps existing keys; note the
  three LLM keys still need to be added to kv/CICD, and SSH_PUBLIC_DEPLOY_KEY
  must match hosts.yaml.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 15:31:00 +08:00
04d349073e ci+docs(vault): SSH key B64-preferred pattern + xworkspace-console Vault setup
- deploy job: read ANSIBLE_SSH_KEY_B64 (preferred) + ANSIBLE_SSH_KEY (fallback)
  from Vault, decode/write ~/.ssh/id_deploy and ssh-keygen -y self-check —
  matches the org SSH-deploy runbook (avoids multiline-key libcrypto errors).
- docs/operations/vault-github-actions.md: full Vault role/policy/jwt/KV setup
  for github-actions-xworkspace-console, mirroring the existing org records.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 15:21:01 +08:00
75d3098d1c ci(deploy-iac): fetch secrets from Vault KV via GitHub OIDC
Replace GitHub Actions Secrets with HashiCorp Vault (https://vault.svc.plus):
- permissions: id-token: write; auth via hashicorp/vault-action@v2 (method=jwt,
  role=github-actions-xworkspace-console, audience=vault) — no static token.
- Each job loads only the keys it needs from kv/data/github-actions/xworkspace-console
  (VULTR_API_KEY, INFRA_REPO_TOKEN, ANSIBLE_SSH_KEY, CLOUDFLARE_API_TOKEN,
  DEEPSEEK/NVIDIA/OLLAMA_API_KEY, optional TF_STATE_*).
- Backend gating now keys off the Vault output (steps.vault.outputs.TF_STATE_BUCKET).
- Drop unused 'playbook' input (deploy is on-host bootstrap).

Pattern mirrors xworkmate-app/.github/workflows/build-and-release.yml.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 15:17:46 +08:00
e74f2334e3 docs(setup): complete optional-parameters manual for curl|bash bootstrap
Expand the all-in-one setup guide (zh+en) into a full reference of the
bootstrap script's supported options, grouped by purpose: subcommands
(uninstall/--purge), public-exposure & security, unified auth-token chain,
runtime modes, offline package, performance/locks, source/version overrides.
Fix the inaccurate TOKEN var -> AI_WORKSPACE_AUTH_TOKEN (the real precedence
chain). Sourced from scripts/setup-ai-workspace-all-in-one.sh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 11:26:01 +08:00
b2c8c5d875 ci+docs: on-host bootstrap deploy job + console serving/verification updates
- deploy-ai-workspace-iac.yaml: deploy job now ssh-es to each host and runs
  the official curl|bash bootstrap locally (host-side ansible -c local,
  offline-accelerated), instead of running all-in-one from the runner (which
  breaks on roles/agent_skills delegate_to: localhost). provision job kept as
  the batch-provision mode.
- docs/operations: record final console fix (local python static backend),
  caddy/public-access architecture, and debian13/ubuntu26.04/macOS verification.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 09:44:22 +08:00
e47b15a5f0 docs(operations): IaC + Ansible dynamic-inventory deploy verification & fixes
Records the IaC->inventory->deploy linkage, offline-package linkage
verification, the local-on-host execution finding, the 5 fixes applied to
playbooks, and the remaining console static-serve + pipeline TODOs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 03:02:59 +08:00
b039a36a69 ci: align deploy pipeline with shared scripts/templates layout
generate.py moved to vultr-vps/scripts/ and provider/variables/cloud-init to
templates/; run render/inventory from VPS_ROOT via scripts/generate.py, keep
terraform -chdir in the env workdir.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 21:23:48 +08:00
7c46dffde2 ci: add IaC + Ansible + Cloudflare matrix deploy pipeline
Matrix pipeline that provisions Vultr hosts via iac_modules vultr-vps
ai-workspace env (Terraform), derives the deploy matrix from the rendered
CMDB, deploys per-host with Ansible all-in-one, then syncs Cloudflare DNS.
Pipelining off + PYTHONWARNINGS=ignore for Python 3.13 targets.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 21:02:32 +08:00
0f289383e2
Update README.md 2026-06-23 14:53:35 +08:00
2832716076
Revise model names in README for accuracy
Updated model names for clarity and consistency.
2026-06-23 14:52:35 +08:00
15aa1d2c25
Fix model names in registration instructions
Updated model registration instructions and corrected names.
2026-06-23 14:52:04 +08:00