Compare commits

...

25 Commits

Author SHA1 Message Date
e6643bdc4d
ci: honor AI_WORKSPACE_AUTH_TOKEN from input/Vault, pass through to host (#11)
Restore the Vault kv/CICD/AI_WORKSPACE_AUTH_TOKEN read in the deploy job
(the key now exists) and resolve the bootstrap token with a clear
precedence: workflow_dispatch input overrides, else Vault value, else
the on-host installer's resolve_unified_auth_token reuses the persisted
~/.ai_workspace_auth_token or generates a new one.

Also fix run-on-host-bootstrap.sh which silently dropped
AI_WORKSPACE_AUTH_TOKEN: it is now written to the remote env payload and
exported, so an input/Vault-provided token is actually honored on the
host instead of being regenerated. Empty stays empty so the no-arg
curl|bash install path still self-generates.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 17:02:33 +08:00
537315f0fc
ci: remove AI_WORKSPACE_AUTH_TOKEN from vault-action reads (#9)
vault-action ignoreNotFound only suppresses path-level 404, not missing
keys within an existing path. Token is now sourced exclusively from the
ai_workspace_auth_token workflow_dispatch input.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-28 16:55:38 +08:00
ddae3b3574
ci: simplify AI_WORKSPACE_AUTH_TOKEN input description for consistency (#7)
Remove openssl rand -hex 32 alternative (format inconsistent with UUID output).
Standardize to UUID-only generation hint matching existing input description style.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-28 16:45:09 +08:00
bfbe038ab2
Release/v1.1.5 (#6)
* ci: backport release/* source validation workflow to release/v1.1.5 (#3)

让现有 release/v1.1.5 分支自身包含门禁 workflow(pull_request_target 用 base 分支版本)。
详见 iac_modules/docs/tldr-github-branch-model.md

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

* backport: support customizable AI_WORKSPACE_AUTH_TOKEN in deployment workflow

* ci: support customizable AI_WORKSPACE_AUTH_TOKEN in deployment workflow (#5)

- Add AI_WORKSPACE_AUTH_TOKEN to Vault KV secret reads (provision + deploy jobs)
- Add ai_workspace_auth_token as optional workflow_dispatch input parameter
- Allow runtime override of auth token (input takes precedence over Vault)
- Include TLDR token generation guidance in workflow description
- Wire token through all-in-one bootstrap with precedence: input > Vault

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 16:34:21 +08:00
b29b85025b
ci: support customizable AI_WORKSPACE_AUTH_TOKEN in deployment workflow (#4)
* ci: backport release/* source validation workflow to release/v1.1.5

让现有 release/v1.1.5 分支自身包含门禁 workflow(pull_request_target 用 base 分支版本)。
详见 iac_modules/docs/tldr-github-branch-model.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ci: support customizable AI_WORKSPACE_AUTH_TOKEN in deployment workflow

- Add AI_WORKSPACE_AUTH_TOKEN to Vault KV secret reads (provision + deploy jobs)
- Add ai_workspace_auth_token as optional workflow_dispatch input parameter
- Allow runtime override of auth token (input takes precedence over Vault)
- Include TLDR token generation guidance in workflow description
- Wire token through all-in-one bootstrap with precedence: input > Vault

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 16:12:49 +08:00
6eb16afb14
ci: add release/* branch source validation workflow (#2)
release/* 仅接受 hotfix/* 或带 cherry-pick/backport 标签的 PR。
详见 iac_modules/docs/tldr-github-branch-model.md

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 12:12:21 +08:00
3ce3c6fb66 fix(iac): require Cloudflare DNS token 2026-06-27 13:48:20 +08:00
2d3289fbc5 fix(installer): resolve local macOS patcher after cwd changes 2026-06-27 09:02:08 +08:00
5093e21e35 fix(installer): use checked-in macOS patcher locally 2026-06-27 08:58:37 +08:00
50c2d85a14 fix(installer): keep macOS OpenClaw plugin on stable path 2026-06-27 08:56:52 +08:00
974904be13 ci: update workflow actions for node 24 2026-06-26 19:05:39 +08:00
338d057375 feat(ci): add provider key wiring toggles 2026-06-26 18:30:29 +08:00
50070c0708 fix(ci): pass tfstate credentials to inventory render 2026-06-26 18:15:35 +08:00
12b5805fb5 fix(ci): pass tfstate credentials to terraform apply 2026-06-26 18:12:21 +08:00
002257ce5b fix(ci): source tf state region from vault 2026-06-26 18:10:28 +08:00
3b270f4959 fix(ci): pin aws tfstate region for s3 backend 2026-06-26 18:07:52 +08:00
8f8e925706 fix(ci): require tf state region from vault 2026-06-26 17:50:04 +08:00
a72e580ae6 fix(ci): default tf state region to us-east-1 2026-06-26 17:47:49 +08:00
26a4794f2f docs(verify): record clean green IaC↔Ansible run + nodejs/resolver fixes
Both hosts reached RC=0 on a single on-host curl|bash bootstrap; console 17000=200,
api 8788 up, litellm 4000=200 "I'm alive!" (incl. ubuntu26 uv-Py3.13), caddy active;
FQDN hostnames set; VPS destroyed, instances=0. Adds fixes #12 (nodejs self-ref
recursion / omit-sentinel leak) and #13 (browser resolver skips disabled chromium stub).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 11:15:24 +08:00
029ef9fc13 chore(mcp): add local MCP debug tooling (github/terraform/ssh servers)
Local MCP debug setup: launcher scripts, config, setup script, and EN/ZH docs.
Secrets live in config/mcp/local-mcp.env (gitignored); commit a sanitized
local-mcp.env.example template instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 22:56:43 +08:00
5a76c5ed06 fix(deploy): on-host bootstrap defaults to online mode (pull fixed main playbooks)
The deploy job ran curl|bash with no AI_WORKSPACE_OFFLINE_MODE -> auto -> stale
offline package, which still ships the pinned-Chrome / root-PGDATA playbooks that
were already fixed in playbooks main. Pipeline kept failing at the Chrome task.

- run-on-host-bootstrap.sh: thread AI_WORKSPACE_OFFLINE_MODE (default off) into the
  remote env so the bootstrap git-clones latest main instead of the stale package.
- workflow: add offline_mode input (off|auto|force, default off); flip back to auto
  once the offline package is republished with the fixes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 22:47:14 +08:00
09a8bae35d fix(iac-workflow): make S3-compatible remote state mandatory (no local fallback)
Previously 'Configure remote backend' had `if: TF_STATE_BUCKET != ''`, so when
the gate evaluated empty the step was skipped and terraform silently fell back to
local state — risking state loss on destroy. TF_STATE_* exist in Vault, so make
the remote backend the default required path:

- Validate step now requires TF_STATE_{ENDPOINT,BUCKET,ACCESS_KEY,SECRET_KEY}
- 'Configure remote backend' always runs (renders backend.tf)
- terraform init fails fast if TF_STATE_BUCKET empty (removed local-state else)
- header comment updated: backend keys are required, not optional

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 22:09:44 +08:00
5ce6dad9bc fix(iac-workflow): change TF_STATE_REGION fallback from us-east-1 to auto
Cloudflare R2 S3-compatible backend requires region=auto; the previous
fallback us-east-1 would cause terraform init to fail if Vault key is absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 21:01:13 +08:00
e39b16e92f fix(ci): checkout bootstrap helper in deploy job 2026-06-25 20:53:33 +08:00
fbfa32ca2a fix(ci): poll on-host bootstrap logs across ssh reconnects 2026-06-25 20:48:20 +08:00
17 changed files with 616 additions and 78 deletions

View File

@ -19,17 +19,19 @@ name: Deploy AI Workspace (IaC + Ansible + Cloudflare)
# SSH_PRIVATE_DEPLOY_KEY_B64 → 部署 SSH 私钥 base64deploy 登录主机,优先)
# SSH_PRIVATE_DEPLOY_KEY → 同上原始多行格式(回退,二选一必填)
# CLOUDFLARE_DNS_API_TOKEN → CF Zone DNS Edit tokendns 同步)
# CLOUDFLARE_API_TOKEN → 兼容旧名DNS job 优先使用 CLOUDFLARE_DNS_API_TOKEN
# kv/openclaw:
# DEEPSEEK_API_KEY → LLM provider keydeploy 注入主机)
# NVIDIA_API_KEY → 同上
# OLLAMA_API_KEY → 同上
#
# 4. Vault KV 可选键(不填则本地 state生产建议填防止 destroy 丢 state
# 4. Vault KV 必填键(远端 S3 兼容 state 后端强制启用,缺失即 fail-fast
# 不再回退本地 state确保 destroy 不丢 state
# kv/CICD:
# TF_STATE_ENDPOINT → S3 兼容对象存储 API URL如 https://ewr1.vultrobjects.com
# TF_STATE_ENDPOINT → S3 兼容对象存储 API URL如 https://<acct>.r2.cloudflarestorage.com
# TF_STATE_BUCKET → bucket 名(如 ai-workspace-tfstate
# TF_STATE_ACCESS_KEY / TF_STATE_SECRET_KEY → 对象存储凭据
# TF_STATE_REGION → 地域(Vultr 填 us-east-1Cloudflare R2 必须填 auto
# TF_STATE_REGION → 地域(Cloudflare R2 必须填 autoVultr 填 us-east-1
# → 对象存储搭建指南见 docs/operations/iac-prerequisites.md §3
#
# 5. ai-workspace-infra 私有仓库(可选加速)
@ -41,6 +43,15 @@ name: Deploy AI Workspace (IaC + Ansible + Cloudflare)
# ai-workspace-infra/vultr-vps/config/resources/ai-workspace-hosts.yaml
# 的 ssh_keys[].public否则 Terraform 创机后 runner 无法 SSH 登录。
#
# 7. AI_WORKSPACE_AUTH_TOKEN统一服务 tokenLiteLLM master key / bridge / vault 等)
# - 三级优先级on-host installer resolve_unified_auth_token 统一解析):
# 1) workflow_dispatch 输入 ai_workspace_auth_token非空时覆盖
# 2) Vault kv/CICD/AI_WORKSPACE_AUTH_TOKEN输入留空时回退
# 3) 两者皆空installer 复用 ~/.ai_workspace_auth_token 或自动生成并持久化
# - 存储位置vault kv patch kv/CICD AI_WORKSPACE_AUTH_TOKEN=<your-token>
# - TLDR 生成python3 -c 'import uuid; print(uuid.uuid4())'
# - 经 run-on-host-bootstrap.sh 透传到主机 env注入 all-in-one 各 role
#
# ── 流水线结构 ───────────────────────────────────────────────────────────────
#
# provision : 批量起机模式开关terraform_action=apply / run_deploy
@ -75,6 +86,12 @@ on:
required: false
default: ""
type: string
offline_mode:
description: "on-host 离线包模式: off=在线拉最新 main(默认,离线包落后时用); auto=离线加速; force=强制离线"
required: false
default: "off"
type: choice
options: ["off", "auto", "force"]
terraform_action:
description: "apply 创建/更新destroy 销毁"
required: false
@ -91,6 +108,26 @@ on:
required: false
default: true
type: boolean
use_deepseek:
description: "是否接入 DeepSeek API key"
required: false
default: true
type: boolean
use_nvidia:
description: "是否接入 NVIDIA API key"
required: false
default: true
type: boolean
use_ollama:
description: "是否接入 Ollama API key"
required: false
default: true
type: boolean
ai_workspace_auth_token:
description: "AI Workspace auth token 覆盖(留空则取 Vault kv/CICD/AI_WORKSPACE_AUTH_TOKEN生成: python3 -c 'import uuid; print(uuid.uuid4())')"
required: false
default: ""
type: string
# id-token: write 用于 Vault 的 GitHub OIDC(JWT) 认证contents: read 拉代码
permissions:
@ -124,7 +161,7 @@ jobs:
steps:
- name: Load Vault secrets (OIDC)
id: vault
uses: hashicorp/vault-action@v2
uses: hashicorp/vault-action@v4
with:
url: ${{ env.VAULT_ADDR }}
method: jwt
@ -137,31 +174,46 @@ jobs:
${{ env.VAULT_KV }} TF_STATE_BUCKET | TF_STATE_BUCKET ;
${{ env.VAULT_KV }} TF_STATE_ACCESS_KEY | TF_STATE_ACCESS_KEY ;
${{ env.VAULT_KV }} TF_STATE_SECRET_KEY | TF_STATE_SECRET_KEY ;
${{ env.VAULT_KV }} TF_STATE_REGION | TF_STATE_REGION
${{ env.VAULT_KV }} TF_STATE_REGION | TF_STATE_REGION ;
${{ env.VAULT_KV }} CLOUDFLARE_DNS_API_TOKEN | CLOUDFLARE_DNS_API_TOKEN ;
${{ env.VAULT_KV }} CLOUDFLARE_API_TOKEN | CLOUDFLARE_API_TOKEN
- name: Validate required secrets
env:
VULTR_API_KEY: ${{ steps.vault.outputs.VULTR_API_KEY }}
TF_STATE_ENDPOINT: ${{ steps.vault.outputs.TF_STATE_ENDPOINT }}
TF_STATE_BUCKET: ${{ steps.vault.outputs.TF_STATE_BUCKET }}
TF_STATE_ACCESS_KEY: ${{ steps.vault.outputs.TF_STATE_ACCESS_KEY }}
TF_STATE_SECRET_KEY: ${{ steps.vault.outputs.TF_STATE_SECRET_KEY }}
TF_STATE_REGION: ${{ steps.vault.outputs.TF_STATE_REGION }}
CLOUDFLARE_DNS_API_TOKEN: ${{ steps.vault.outputs.CLOUDFLARE_DNS_API_TOKEN }}
CLOUDFLARE_API_TOKEN: ${{ steps.vault.outputs.CLOUDFLARE_API_TOKEN }}
run: |
set -euo pipefail
# 只校验 REQUIRED 机密非空(不打印任何值,仅判空);可选键
# (INFRA_REPO_TOKEN / TF_STATE_*) 不在此校验。
# 校验 REQUIRED 机密非空(不打印任何值,仅判空)
# 远端 S3 兼容 state 后端为强制要求(默认开启,不再回退本地 state
missing=0
if [ -z "${VULTR_API_KEY:-}" ]; then
echo "::error::缺少必需机密 VULTR_API_KEY (Vault: ${VAULT_KV}/VULTR_API_KEY)"
missing=1
fi
for k in TF_STATE_ENDPOINT TF_STATE_BUCKET TF_STATE_ACCESS_KEY TF_STATE_SECRET_KEY TF_STATE_REGION; do
if [ -z "$(eval echo \"\${$k:-}\")" ]; then
echo "::error::缺少必需机密 $k (Vault: ${VAULT_KV}/$k) —— 远端 S3 state 后端为强制要求"
missing=1
fi
done
[ "$missing" -eq 0 ] || { echo "::error::必需机密缺失,终止 provision"; exit 1; }
- name: Checkout iac_modules
uses: actions/checkout@v4
uses: actions/checkout@v7
with:
repository: ai-workspace-infra/iac_modules
ref: ${{ github.event.inputs.infra_ref || 'main' }}
path: infra/iac_modules
- name: Checkout playbooks
uses: actions/checkout@v4
uses: actions/checkout@v7
with:
repository: ai-workspace-infra/playbooks
ref: ${{ github.event.inputs.infra_ref || 'main' }}
@ -171,18 +223,18 @@ jobs:
with:
terraform_version: "1.9.8"
- uses: actions/setup-python@v5
- uses: actions/setup-python@v6
with:
python-version: "3.12"
- name: Install render deps
run: pip install --quiet pyyaml jinja2
- name: Configure remote backend (optional)
if: ${{ steps.vault.outputs.TF_STATE_BUCKET != '' }}
- name: Configure remote backend (S3-compatible, required)
working-directory: ${{ env.ENV_DIR }}
env:
TF_STATE_ENDPOINT: ${{ steps.vault.outputs.TF_STATE_ENDPOINT }}
TF_STATE_REGION: ${{ steps.vault.outputs.TF_STATE_REGION }}
run: python3 $GITHUB_WORKSPACE/${{ env.VPS_ROOT }}/scripts/render_backend_tf.py backend.tf
- name: generate.py render (YAML -> 显式 HCL + tfvars)
@ -199,19 +251,22 @@ jobs:
TF_STATE_REGION: ${{ steps.vault.outputs.TF_STATE_REGION }}
run: |
set -euo pipefail
if [ -n "${TF_STATE_BUCKET}" ]; then
terraform init -input=false \
-backend-config="bucket=${TF_STATE_BUCKET}" \
-backend-config="key=ai-workspace/terraform.tfstate" \
-backend-config="region=${TF_STATE_REGION:-us-east-1}"
else
echo "::warning::未配置远端 state(Vault 无 TF_STATE_BUCKET),使用本地 state仅适合一次性演示destroy 需同一次运行)"
terraform init -input=false
# 远端 S3 兼容 state 后端强制启用backend.tf 已由上一步渲染);
# 缺失 bucket 直接失败,不回退本地 state。
if [ -z "${TF_STATE_BUCKET}" ]; then
echo "::error::TF_STATE_BUCKET 为空 —— 远端 state 后端为强制要求,终止"
exit 1
fi
terraform init -input=false \
-backend-config="bucket=${TF_STATE_BUCKET}" \
-backend-config="key=ai-workspace/terraform.tfstate" \
-backend-config="region=${TF_STATE_REGION}"
- name: Terraform ${{ github.event.inputs.terraform_action || 'apply' }}
working-directory: ${{ env.ENV_DIR }}
env:
AWS_ACCESS_KEY_ID: ${{ steps.vault.outputs.TF_STATE_ACCESS_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ steps.vault.outputs.TF_STATE_SECRET_KEY }}
TF_VAR_vultr_api_key: ${{ steps.vault.outputs.VULTR_API_KEY }}
run: |
set -euo pipefail
@ -220,6 +275,9 @@ jobs:
- name: generate.py inventory (terraform output + YAML -> cmdb.json + inventory.ini)
if: ${{ (github.event.inputs.terraform_action || 'apply') == 'apply' }}
working-directory: ${{ env.VPS_ROOT }}
env:
AWS_ACCESS_KEY_ID: ${{ steps.vault.outputs.TF_STATE_ACCESS_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ steps.vault.outputs.TF_STATE_SECRET_KEY }}
run: python3 scripts/generate.py inventory
- name: Build deploy matrix from cmdb.json
@ -235,7 +293,7 @@ jobs:
- name: Upload CMDB + inventory artifact
if: ${{ (github.event.inputs.terraform_action || 'apply') == 'apply' }}
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v7
with:
name: ai-workspace-cmdb
path: |
@ -260,7 +318,7 @@ jobs:
# 跑官方引导脚本——与用户 self-host 的 curl|bash 完全同一路径。
- name: Load Vault secrets (OIDC)
id: vault
uses: hashicorp/vault-action@v2
uses: hashicorp/vault-action@v4
with:
url: ${{ env.VAULT_ADDR }}
method: jwt
@ -272,15 +330,23 @@ jobs:
${{ env.VAULT_KV }} SSH_PRIVATE_DEPLOY_KEY_B64 | ANSIBLE_SSH_KEY_B64 ;
${{ env.VAULT_KV_OPENCLAW }} DEEPSEEK_API_KEY | DEEPSEEK_API_KEY ;
${{ env.VAULT_KV_OPENCLAW }} NVIDIA_API_KEY | NVIDIA_API_KEY ;
${{ env.VAULT_KV_OPENCLAW }} OLLAMA_API_KEY | OLLAMA_API_KEY
${{ env.VAULT_KV_OPENCLAW }} OLLAMA_API_KEY | OLLAMA_API_KEY ;
${{ env.VAULT_KV }} AI_WORKSPACE_AUTH_TOKEN | AI_WORKSPACE_AUTH_TOKEN
- name: Report provider key wiring
run: |
set -euo pipefail
echo "DeepSeek: ${{ github.event.inputs.use_deepseek == 'false' && 'skipped' || 'enabled' }}"
echo "NVIDIA: ${{ github.event.inputs.use_nvidia == 'false' && 'skipped' || 'enabled' }}"
echo "Ollama: ${{ github.event.inputs.use_ollama == 'false' && 'skipped' || 'enabled' }}"
- name: Validate required secrets
env:
ANSIBLE_SSH_KEY: ${{ steps.vault.outputs.ANSIBLE_SSH_KEY }}
ANSIBLE_SSH_KEY_B64: ${{ steps.vault.outputs.ANSIBLE_SSH_KEY_B64 }}
DEEPSEEK_API_KEY: ${{ steps.vault.outputs.DEEPSEEK_API_KEY }}
NVIDIA_API_KEY: ${{ steps.vault.outputs.NVIDIA_API_KEY }}
OLLAMA_API_KEY: ${{ steps.vault.outputs.OLLAMA_API_KEY }}
DEEPSEEK_API_KEY: ${{ github.event.inputs.use_deepseek == 'false' && '' || steps.vault.outputs.DEEPSEEK_API_KEY }}
NVIDIA_API_KEY: ${{ github.event.inputs.use_nvidia == 'false' && '' || steps.vault.outputs.NVIDIA_API_KEY }}
OLLAMA_API_KEY: ${{ github.event.inputs.use_ollama == 'false' && '' || steps.vault.outputs.OLLAMA_API_KEY }}
run: |
set -euo pipefail
# 只校验 REQUIRED 机密非空(不打印任何值,仅判空)。
@ -290,22 +356,25 @@ jobs:
echo "::error::缺少必需机密 SSH 私钥 (Vault: ${VAULT_KV}/SSH_PRIVATE_DEPLOY_KEY_B64 或 ${VAULT_KV}/SSH_PRIVATE_DEPLOY_KEY至少一个)"
missing=1
fi
if [ -z "${DEEPSEEK_API_KEY:-}" ]; then
if [ "${{ github.event.inputs.use_deepseek || 'true' }}" = "true" ] && [ -z "${DEEPSEEK_API_KEY:-}" ]; then
echo "::error::缺少必需机密 DEEPSEEK_API_KEY (Vault: ${VAULT_KV_OPENCLAW}/DEEPSEEK_API_KEY)"
missing=1
fi
if [ -z "${NVIDIA_API_KEY:-}" ]; then
if [ "${{ github.event.inputs.use_nvidia || 'true' }}" = "true" ] && [ -z "${NVIDIA_API_KEY:-}" ]; then
echo "::error::缺少必需机密 NVIDIA_API_KEY (Vault: ${VAULT_KV_OPENCLAW}/NVIDIA_API_KEY)"
missing=1
fi
if [ -z "${OLLAMA_API_KEY:-}" ]; then
if [ "${{ github.event.inputs.use_ollama || 'true' }}" = "true" ] && [ -z "${OLLAMA_API_KEY:-}" ]; then
echo "::error::缺少必需机密 OLLAMA_API_KEY (Vault: ${VAULT_KV_OPENCLAW}/OLLAMA_API_KEY)"
missing=1
fi
[ "$missing" -eq 0 ] || { echo "::error::必需机密缺失,终止 deploy"; exit 1; }
- name: Checkout xworkspace-console helpers
uses: actions/checkout@v7
- name: Download CMDB (host IP source)
uses: actions/download-artifact@v4
uses: actions/download-artifact@v8
with:
name: ai-workspace-cmdb
path: cmdb
@ -342,31 +411,20 @@ jobs:
- name: Run on-host bootstrap (curl | bash, local-mode install)
env:
DEEPSEEK_API_KEY: ${{ steps.vault.outputs.DEEPSEEK_API_KEY }}
NVIDIA_API_KEY: ${{ steps.vault.outputs.NVIDIA_API_KEY }}
OLLAMA_API_KEY: ${{ steps.vault.outputs.OLLAMA_API_KEY }}
run: |
set -euo pipefail
ip="$(jq -r '.["${{ matrix.host }}"].ip' cmdb/cmdb.json)"
user="$(jq -r '.["${{ matrix.host }}"].ansible_user // "root"' cmdb/cmdb.json)"
# bridge 域名 = operator 覆盖(input) 否则各主机 CMDB service_domains 的首个,
# 用作 /etc/hostname 与 xworkmate-bridge.caddyon-host 模型拿不到 inventory
# 故由流水线作为 XWORKMATE_BRIDGE_DOMAIN env 注入。
domain='${{ github.event.inputs.bridge_domain }}'
if [ -z "$domain" ]; then
domain="$(jq -r '.["${{ matrix.host }}"].host_vars.service_domains // ""' cmdb/cmdb.json | cut -d, -f1 | tr -d ' ')"
fi
echo "Bootstrapping ${{ matrix.host }} (${user}@${ip}) on-host, domain=${domain:-<none>} ..."
ssh -i ~/.ssh/id_deploy \
-o StrictHostKeyChecking=accept-new \
-o ServerAliveInterval=20 -o ServerAliveCountMax=15 \
-o ConnectTimeout=20 \
"${user}@${ip}" \
"XWORKMATE_BRIDGE_DOMAIN='${domain}' \
DEEPSEEK_API_KEY='${DEEPSEEK_API_KEY}' \
NVIDIA_API_KEY='${NVIDIA_API_KEY}' \
OLLAMA_API_KEY='${OLLAMA_API_KEY}' \
bash -lc 'curl -sfL https://install.svc.plus/ai-workspace | bash -'"
MATRIX_HOST: ${{ matrix.host }}
CMDB_PATH: cmdb/cmdb.json
SSH_KEY_PATH: ~/.ssh/id_deploy
# 离线包落后于 main 时用在线模式拉最新 playbook见 run-on-host-bootstrap.sh
# 离线包重新发布后可设为 auto 恢复离线加速。
AI_WORKSPACE_OFFLINE_MODE: ${{ github.event.inputs.offline_mode || 'off' }}
XWORKMATE_BRIDGE_DOMAIN: ${{ github.event.inputs.bridge_domain }}
# input 非空则覆盖;否则取 Vault kv/CICD/AI_WORKSPACE_AUTH_TOKEN
# 两者皆空时由 on-host installer (resolve_unified_auth_token) 自动生成并持久化。
AI_WORKSPACE_AUTH_TOKEN: ${{ github.event.inputs.ai_workspace_auth_token != '' && github.event.inputs.ai_workspace_auth_token || steps.vault.outputs.AI_WORKSPACE_AUTH_TOKEN }}
DEEPSEEK_API_KEY: ${{ github.event.inputs.use_deepseek == 'false' && '' || steps.vault.outputs.DEEPSEEK_API_KEY }}
NVIDIA_API_KEY: ${{ github.event.inputs.use_nvidia == 'false' && '' || steps.vault.outputs.NVIDIA_API_KEY }}
OLLAMA_API_KEY: ${{ github.event.inputs.use_ollama == 'false' && '' || steps.vault.outputs.OLLAMA_API_KEY }}
run: bash scripts/run-on-host-bootstrap.sh
# ---------------------------------------------------------------------------
dns:
@ -377,7 +435,7 @@ jobs:
steps:
- name: Load Vault secrets (OIDC)
id: vault
uses: hashicorp/vault-action@v2
uses: hashicorp/vault-action@v4
with:
url: ${{ env.VAULT_ADDR }}
method: jwt
@ -401,19 +459,19 @@ jobs:
[ "$missing" -eq 0 ] || { echo "::error::必需机密缺失,终止 dns"; exit 1; }
- name: Checkout playbooks
uses: actions/checkout@v4
uses: actions/checkout@v7
with:
repository: ai-workspace-infra/playbooks
ref: ${{ github.event.inputs.infra_ref || 'main' }}
path: infra/playbooks
- name: Download CMDB + inventory
uses: actions/download-artifact@v4
uses: actions/download-artifact@v8
with:
name: ai-workspace-cmdb
path: cmdb
- uses: actions/setup-python@v5
- uses: actions/setup-python@v6
with:
python-version: "3.12"

View File

@ -0,0 +1,44 @@
name: Validate Release PR
# release/* 分支的发布策略门禁:仅接受 hotfix/* 或带 cherry-pick/backport 标签的 PR。
# 详见 iac_modules/docs/tldr-github-branch-model.md
on:
pull_request_target:
types: [opened, synchronize, reopened, labeled, unlabeled]
permissions:
contents: read
pull-requests: read
jobs:
validate-release-source:
runs-on: ubuntu-latest
if: startsWith(github.base_ref, 'release/')
steps:
- name: Check PR source branch
run: |
SRC="${{ github.head_ref }}"
TGT="${{ github.base_ref }}"
LABELS="${{ join(github.event.pull_request.labels.*.name, ',') }}"
echo "🔍 Validating PR into release branch"
echo " source: $SRC"
echo " target: $TGT"
echo " labels: $LABELS"
if [[ "$SRC" =~ ^hotfix/ ]]; then
echo "✅ Allowed: hotfix/* branch"
exit 0
fi
if [[ "$LABELS" =~ (^|,)(cherry-pick|backport)(,|$) ]]; then
echo "✅ Allowed: cherry-pick/backport labeled PR"
exit 0
fi
echo "❌ Rejected."
echo "release/* 仅接受:"
echo " - 来自 hotfix/* 的 PR"
echo " - 带 cherry-pick 或 backport 标签的 PR已验证 feature 的 backport/cherry-pick"
echo "禁止从 main / develop / feature/* 直接合并到 release/*。"
exit 1

3
.gitignore vendored
View File

@ -53,3 +53,6 @@ coverage/
*.textClipping
scripts/__pycache__/
# local MCP debug secrets (contains a real PAT) — never commit
config/mcp/local-mcp.env

View File

@ -0,0 +1,8 @@
#!/usr/bin/env bash
set -euo pipefail
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
. "${ROOT_DIR}/config/mcp/local-mcp.env"
exec docker run --rm -i \
-e GITHUB_PERSONAL_ACCESS_TOKEN \
-e GITHUB_TOOLSETS=default,actions \
ghcr.io/github/github-mcp-server:latest

View File

@ -0,0 +1,3 @@
#!/usr/bin/env bash
set -euo pipefail
exec npx -y mcp-ssh-manager@latest

View File

@ -0,0 +1,5 @@
#!/usr/bin/env bash
set -euo pipefail
exec docker run --rm -i \
ghcr.io/hashicorp/terraform-mcp-server:latest \
--toolsets=registry

View File

@ -0,0 +1,13 @@
{
"mcpServers": {
"github": {
"command": "/Users/shenlan/workspaces/ai-workspace-lab/xworkspace-console/config/mcp/bin/github-mcp-server.sh"
},
"terraform": {
"command": "/Users/shenlan/workspaces/ai-workspace-lab/xworkspace-console/config/mcp/bin/terraform-mcp-server.sh"
},
"ssh-manager": {
"command": "/Users/shenlan/workspaces/ai-workspace-lab/xworkspace-console/config/mcp/bin/mcp-ssh-manager.sh"
}
}
}

View File

@ -0,0 +1 @@
GITHUB_PERSONAL_ACCESS_TOKEN=ghp_REPLACE_WITH_YOUR_TOKEN

View File

@ -0,0 +1,35 @@
# Local MCP Debug Pack
This pack is tuned for local debugging with a small tool surface.
## Included
- `github-mcp-server`
- `terraform-mcp-server`
- `mcp-ssh-manager`
- `ansible.mcp` as an Ansible collection dependency, not a standalone MCP daemon
## One-step setup
```bash
cd /Users/shenlan/workspaces/ai-workspace-lab/xworkspace-console
./scripts/setup-local-mcp-debug.sh
```
The script writes:
- `/Users/shenlan/workspaces/ai-workspace-lab/xworkspace-console/config/mcp/local-mcp-config.json`
- `/Users/shenlan/workspaces/ai-workspace-lab/xworkspace-console/config/mcp/local-mcp.env`
- `/Users/shenlan/workspaces/ai-workspace-lab/xworkspace-console/config/mcp/bin/*.sh`
## Recommended defaults
- GitHub MCP stays on the minimal default toolset
- Terraform MCP stays on `registry`
- SSH Manager runs through `npx` to avoid a global install
- The GitHub token stays local in `local-mcp.env`
## Required env vars
- `GITHUB_PERSONAL_ACCESS_TOKEN`
- `TFC_TOKEN` only if you need Terraform Cloud / Enterprise access

View File

@ -53,6 +53,8 @@ config/resources/ai-workspace-hosts.yaml (IaC 声明, 唯一人工入口)
| 9 | `acp_server_opencode` ACP 端点校验超时 | 服务(重)启后 ~1s 即探测,adapter 已 accept TCP 但未应答;`uri` 默认 30s + `retries/until` 在连接超时上未真正循环,一次即败 | 改为 **curl 重试循环**(每次 5s、最多 ~30 次);adapter 就绪后 `acp.capabilities` ~4ms 回 200 |
| 10 | litellm × Python 3.14(仅 Ubuntu 26.04) | pinned litellm fork 要求 `<3.14`,Ubuntu 26.04 系统 py=3.14 且 apt 无 3.13/3.12 → `pip install` 报 "requires a different Python" | 系统解释器 ≥3.14 时用 **`uv` 装独立 Python 3.13** 重建 venv;Debian 13(3.13)不受影响 |
| 11 | `inventory_hostname` 硬编码短名/127.0.0.1 | 主机标识/hostname/caddy 站点名错位 | `generate.py``service_domains` 首个 **FQDN** 为 CMDB/inventory 键;`.sh` on-host 的 `-i` 用 FQDN;bridge 角色据此设 `/etc/hostname` 与 caddy 站点名 |
| 12 | `nodejs_version` 自引用(Ansible 2.19 递归) | xfce include nodejs 角色传 `nodejs_version: "{{ ai_agent_runtime_nodejs_version \| default(nodejs_version) }}"`,2.19+ 惰性模板判定 `Recursive loop detected``nodejs_version_major` set_fact 失败 | 改显式回退 `default('22.22.3', true)`。**坑**:`default(omit)` 在 include_role vars 里不回退角色默认而是塞入 omit 占位符,渲染成 `node_<<Omit>>.x` 仓库地址致 apt update 失败 |
| 13 | 浏览器 resolver 选中 disabled stub | ai_agent_runtime resolver 仅以 `command -v`/`-x` 判存在,选中 xfce 装的 `/usr/local/bin/chromium` 禁用 stub(退出 126)而非 google-chrome → `Check chromium version` rc=126 失败(再次运行/角色顺序触发) | resolver 增加 `<candidate> --version` 实跑校验,跳过 stub,解析到 google-chrome |
部署侧加固(长途控制连接稳定性): `ANSIBLE_SSH_ARGS``ServerAliveInterval/ControlPersist`, `ANSIBLE_SSH_RETRIES`
@ -78,12 +80,16 @@ config/resources/ai-workspace-hosts.yaml (IaC 声明, 唯一人工入口)
- console(python 伺服)+ api(bin 路径)在两台全新主机直接 active、17000=200(此前 console 崩溃重启)。
- **FQDN hostname** 在 ubuntu 实测生效;agent_skills 重构、lock_timeout(bridge/fail2ban修复均已越过。
**litellm / qmd最新轮:** 修复推进显著——
- **FQDN hostname 两台均生效**(`xworkmate-bridge-debian-13/ubuntu-26.svc.plus`)。
- **debian13:litellm `:4000` 健康 200(已起)**,console/api active,openclaw activating。该机 `rc≠0` 的新因是某组件 **pinned SHA `236c83a5…` git 检出失败**(疑似强推/已删 commit;与本次部署修复无关,属组件版本钉点问题)。
- **ubuntu26:litellm `:4000=000`(未起)**,uv-Py3.13(#10后仍需定位(待查)。
**最终干净一轮(IaC 起机 → on-host `curl|bash` → 两台 `RC=0`)—— 全绿:**
→ 即 console/api/FQDN/部分 litellm 已闭环;剩余两点(组件 SHA 检出、ubuntu litellm 收口)留待后续一次干净重跑定位。
| 平台 | hostname | curl\|bash | 17000(console) | 8788(api) | 4000(litellm) | caddy |
|------|----------|-----------|----------------|-----------|---------------|-------|
| debian13 | `xworkmate-bridge-debian-13.svc.plus` ✓ | **RC=0** | 200 | up(404 无根路由) | **200 `"I'm alive!"`** | active |
| ubuntu26.04 | `xworkmate-bridge-ubuntu-26.svc.plus` ✓ | **RC=0** | 200 | up | **200 `"I'm alive!"`**(uv-Py3.13 #10 生效) | active |
- 运行单元:`caddy.service` / `litellm-proxy.service` / `xworkmate-bridge.service`(console/api 在各自端口应答)。
- **#12 nodejs 递归 + #13 resolver stub** 是本轮新定位并修复的两处(均在活跃部署路径,非脏机伪症);叠加 #1#11 后两台一次性 `RC=0`
- 验证完即 `terraform destroy`(2 instance + ssh key,Vultr API 复核 instances=0,零计费残留)。
- deploy 流水线: `deploy-ai-workspace-iac.yaml` 的 deploy job 已改为"ssh 到主机本地跑 curl|bash 引导"(契合本地执行模型 + 离线加速),provision job 保留为批量起机模式;密钥经 Vault OIDC 取。

View File

@ -107,6 +107,10 @@ vault kv patch kv/CICD \
```
- 填入 `TF_STATE_ENDPOINT=https://ewr1.vultrobjects.com``TF_STATE_REGION=us-east-1`
**AWS S3**
- 如果后端是 AWS S3 标准 bucket`TF_STATE_ENDPOINT` 通常直接填 S3 API endpoint例如 `https://s3.us-east-1.amazonaws.com`
- `TF_STATE_REGION` 需要与 bucket 所在区域一致;对 `ai-workspace-tfstate` 这类 us-east-1 bucket`us-east-1`
**Cloudflare R2**(已在用 CF无出口流量费
- 控制台 → R2 → 建 bucket → Manage API Tokens → 建读写 token
- `TF_STATE_ENDPOINT=https://<account_id>.r2.cloudflarestorage.com`

View File

@ -0,0 +1,41 @@
# 本机 MCP 调试包
这个调试包面向 `xworkspace-console` 的本地联调场景,目标是尽量少的安装步骤、尽量少的 MCP 工具暴露面。
## 覆盖范围
- `github-mcp-server`
- `terraform-mcp-server`
- `mcp-ssh-manager`
- `ansible.mcp` 作为 Ansible collection 依赖安装,不是独立 MCP 服务
## 一键准备
```bash
cd /Users/shenlan/workspaces/ai-workspace-lab/xworkspace-console
./scripts/setup-local-mcp-debug.sh
```
脚本会生成:
- `/Users/shenlan/workspaces/ai-workspace-lab/xworkspace-console/config/mcp/local-mcp-config.json`
- `/Users/shenlan/workspaces/ai-workspace-lab/xworkspace-console/config/mcp/local-mcp.env`
- `/Users/shenlan/workspaces/ai-workspace-lab/xworkspace-console/config/mcp/bin/*.sh`
## 推荐用法
- GitHub MCP 默认只开 `default` 工具集对应的最小面,再补少量常用工具集
- Terraform MCP 默认只开 `registry`
- SSH Manager 用 `npx` 启动,避免全局安装
- GitHub token 只写入本地 `local-mcp.env`,不会进入聊天内容
## 需要的环境变量
- `GITHUB_PERSONAL_ACCESS_TOKEN`
- `TFC_TOKEN` 仅当你要连 Terraform Cloud / Enterprise 时才需要
## 调试建议
- 先用 GitHub MCP 复现 action / PR / repo 相关问题
- 再按需打开 Terraform MCP 的 `terraform` 工具集
- SSH Manager 用于远程主机调试,不影响前两个 server

View File

@ -675,17 +675,61 @@ def main():
" become_user: \"{{ gateway_openclaw_service_user }}\"\n"
" when: ansible_os_family == 'Darwin'\n"
"\n"
"- name: Link openclaw-multi-session-plugins to extensions (macOS)\n"
"- name: Inspect installed openclaw-multi-session-plugins path (macOS)\n"
" ansible.builtin.stat:\n"
" path: \"{{ gateway_openclaw_home }}/.openclaw/extensions/openclaw-multi-session-plugins\"\n"
" follow: false\n"
" register: openclaw_plugin_extension_stat_macos\n"
" when: ansible_os_family == 'Darwin'\n"
"\n"
"- name: Remove legacy temporary plugin symlink (macOS)\n"
" ansible.builtin.file:\n"
" src: \"{{ gateway_openclaw_multi_session_plugin_dir | default('/tmp/openclaw-multi-session-plugins') }}\"\n"
" dest: \"{{ gateway_openclaw_home }}/.openclaw/extensions/openclaw-multi-session-plugins\"\n"
" state: link\n"
" path: \"{{ gateway_openclaw_home }}/.openclaw/extensions/openclaw-multi-session-plugins\"\n"
" state: absent\n"
" when:\n"
" - ansible_os_family == 'Darwin'\n"
" - openclaw_plugin_extension_stat_macos.stat.islnk | default(false)\n"
" notify: Restart openclaw\n"
"\n"
"- name: Ensure stable openclaw-multi-session-plugins directory (macOS)\n"
" ansible.builtin.file:\n"
" path: \"{{ gateway_openclaw_home }}/.openclaw/extensions/openclaw-multi-session-plugins\"\n"
" state: directory\n"
" owner: \"{{ gateway_openclaw_service_user }}\"\n"
" group: \"{{ gateway_openclaw_service_group }}\"\n"
" mode: \"0755\"\n"
" when: ansible_os_family == 'Darwin'\n"
" notify: Restart openclaw\n"
"\n"
"- name: Copy built openclaw-multi-session-plugins into stable directory (macOS)\n"
" ansible.builtin.copy:\n"
" src: \"{{ gateway_openclaw_multi_session_plugin_dir | default('/tmp/openclaw-multi-session-plugins') }}/{{ item }}\"\n"
" dest: \"{{ gateway_openclaw_home }}/.openclaw/extensions/openclaw-multi-session-plugins/\"\n"
" remote_src: true\n"
" owner: \"{{ gateway_openclaw_service_user }}\"\n"
" group: \"{{ gateway_openclaw_service_group }}\"\n"
" mode: preserve\n"
" loop:\n"
" - dist\n"
" - openclaw.plugin.json\n"
" - package.json\n"
" become_user: \"{{ gateway_openclaw_service_user }}\"\n"
" when: ansible_os_family == 'Darwin'\n"
" notify: Restart openclaw\n"
"\n"
"- name: Record stable openclaw-multi-session-plugins install (macOS)\n"
" ansible.builtin.command:\n"
" cmd: >-\n"
" {{ gateway_openclaw_binary_path }} plugins install\n"
" {{ (gateway_openclaw_home ~ '/.openclaw/extensions/openclaw-multi-session-plugins') | quote }} --force\n"
" environment:\n"
" HOME: \"{{ gateway_openclaw_home }}\"\n"
" PATH: \"{{ gateway_openclaw_service_path }}\"\n"
" OPENCLAW_NO_RESPAWN: \"1\"\n"
" become_user: \"{{ gateway_openclaw_service_user }}\"\n"
" changed_when: false\n"
" when: ansible_os_family == 'Darwin'\n"
"\n"
)
if anchor in text and "Clone openclaw-multi-session-plugins repository (macOS)" not in text:
text = text.replace(anchor, injected + anchor, 1)

View File

@ -0,0 +1,109 @@
#!/usr/bin/env bash
set -euo pipefail
cmdb_path=${CMDB_PATH:-cmdb/cmdb.json}
host=${MATRIX_HOST:?MATRIX_HOST is required}
ssh_key=${SSH_KEY_PATH:-"$HOME/.ssh/id_deploy"}
ssh_key="${ssh_key/#\~/$HOME}"
run_id=${GITHUB_RUN_ID:-manual}
ip="$(jq -r --arg host "$host" '.[$host].ip' "$cmdb_path")"
user="$(jq -r --arg host "$host" '.[$host].ansible_user // "root"' "$cmdb_path")"
domain="${XWORKMATE_BRIDGE_DOMAIN:-}"
if [ -z "$domain" ]; then
domain="$(jq -r --arg host "$host" '.[$host].host_vars.service_domains // ""' "$cmdb_path" | cut -d, -f1 | tr -d ' ')"
fi
if [ -z "$ip" ] || [ "$ip" = "null" ]; then
echo "::error::No IP found in ${cmdb_path} for ${host}" >&2
exit 1
fi
ssh_opts=(
-i "$ssh_key"
-o StrictHostKeyChecking=accept-new
-o ServerAliveInterval=30
-o ServerAliveCountMax=60
-o ConnectTimeout=20
-o BatchMode=yes
)
remote_dir="/tmp/xworkspace-bootstrap-${run_id}-${host//[^A-Za-z0-9_.-]/_}"
remote_env="${remote_dir}/env"
remote_log="${remote_dir}/bootstrap.log"
remote_rc="${remote_dir}/bootstrap.rc"
remote_runner="${remote_dir}/run.sh"
echo "Bootstrapping ${host} (${user}@${ip}) on-host, domain=${domain:-<none>} ..."
remote_payload="$(mktemp)"
trap 'rm -f "$remote_payload"' EXIT
# 离线包是按 release 快照打包的;当其落后于 playbooks main如 Chrome 版本钉点、
# postgres PGDATA 属主等已在 main 修复但未重新发包)时,默认 offline=auto 会用到
# 过期 playbook 导致部署失败。默认 off让 on-host 引导在线 git clone 最新 main
# 待离线包重新发布后可改回 auto 以恢复离线加速。
{
printf 'AI_WORKSPACE_OFFLINE_MODE=%q\n' "${AI_WORKSPACE_OFFLINE_MODE:-off}"
printf 'XWORKMATE_BRIDGE_DOMAIN=%q\n' "$domain"
# 空则不写,让 on-host installer 的 resolve_unified_auth_token 走"复用持久化/自动生成"分支。
printf 'AI_WORKSPACE_AUTH_TOKEN=%q\n' "${AI_WORKSPACE_AUTH_TOKEN:-}"
printf 'DEEPSEEK_API_KEY=%q\n' "${DEEPSEEK_API_KEY:-}"
printf 'NVIDIA_API_KEY=%q\n' "${NVIDIA_API_KEY:-}"
printf 'OLLAMA_API_KEY=%q\n' "${OLLAMA_API_KEY:-}"
} > "$remote_payload"
ssh "${ssh_opts[@]}" "${user}@${ip}" "mkdir -p '$remote_dir' && chmod 700 '$remote_dir'"
scp "${ssh_opts[@]}" "$remote_payload" "${user}@${ip}:${remote_env}" >/dev/null
ssh "${ssh_opts[@]}" "${user}@${ip}" "chmod 600 '$remote_env'"
ssh "${ssh_opts[@]}" "${user}@${ip}" "cat > '$remote_runner' && chmod 700 '$remote_runner'" <<'REMOTE_SCRIPT'
#!/usr/bin/env bash
set -euo pipefail
remote_env=$1
remote_log=$2
remote_rc=$3
if [ -f "$remote_rc" ]; then
exit 0
fi
(
set +e
source "$remote_env"
export AI_WORKSPACE_OFFLINE_MODE XWORKMATE_BRIDGE_DOMAIN AI_WORKSPACE_AUTH_TOKEN DEEPSEEK_API_KEY NVIDIA_API_KEY OLLAMA_API_KEY
bash -lc 'curl -sfL https://install.svc.plus/ai-workspace | bash -'
rc=$?
printf '%s\n' "$rc" > "$remote_rc"
exit "$rc"
) > "$remote_log" 2>&1 &
REMOTE_SCRIPT
ssh "${ssh_opts[@]}" "${user}@${ip}" "nohup '$remote_runner' '$remote_env' '$remote_log' '$remote_rc' >/dev/null 2>&1 &"
last_lines=0
while true; do
poll_output="$(ssh "${ssh_opts[@]}" "${user}@${ip}" "if [ -f '$remote_log' ]; then wc -l < '$remote_log'; else echo 0; fi; if [ -f '$remote_rc' ]; then cat '$remote_rc'; else echo RUNNING; fi" 2>/dev/null || true)"
line_count="$(printf '%s\n' "$poll_output" | sed -n '1p')"
rc_value="$(printf '%s\n' "$poll_output" | sed -n '2p')"
case "$line_count" in
''|*[!0-9]*) line_count=0 ;;
esac
if [ "$line_count" -gt "$last_lines" ]; then
start=$((last_lines + 1))
ssh "${ssh_opts[@]}" "${user}@${ip}" "sed -n '${start},${line_count}p' '$remote_log'" || true
last_lines="$line_count"
else
echo "[INFO] Bootstrap still running on ${host}; no new log lines."
fi
if [ "$rc_value" != "RUNNING" ] && [ -n "$rc_value" ]; then
if [ "$rc_value" = "0" ]; then
echo "[SUCCESS] Bootstrap completed on ${host}."
exit 0
fi
echo "::error::Bootstrap failed on ${host} with exit code ${rc_value}."
exit "$rc_value"
fi
sleep 20
done

View File

@ -1085,11 +1085,15 @@ patch_playbooks_for_macos() {
info "Fetching and running macOS playbook patches..."
local patch_script="/tmp/patch-macos-playbooks.py"
local raw_url="https://raw.githubusercontent.com/ai-workspace-lab/xworkspace-console/main/scripts/patch-macos-playbooks.py"
local local_patch_script
local_patch_script="${XWORKSPACE_CONSOLE_DIR}/scripts/patch-macos-playbooks.py"
if command -v curl >/dev/null 2>&1; then
curl -sfL -o "$patch_script" "$raw_url"
if [ -f "$local_patch_script" ]; then
cp "$local_patch_script" "$patch_script"
elif command -v curl >/dev/null 2>&1; then
curl -sfL -o "$patch_script" "${raw_url}?rev=$(date +%s)"
else
wget -qO "$patch_script" "$raw_url"
wget -qO "$patch_script" "${raw_url}?rev=$(date +%s)"
fi
if [ -f "$patch_script" ]; then
@ -2017,12 +2021,12 @@ append_var "LITELLM_SOURCE_REPO" "litellm_source_repo"
append_var "LITELLM_VERSION" "litellm_version"
append_var "OPENCLAW_MULTI_SESSION_PLUGIN_PACKAGE_SPEC" "gateway_openclaw_multi_session_plugin_package_spec"
append_var "DEEPSEEK_API_KEY" "litellm_deepseek_api_key"
append_var "NVIDIA_API_KEY" "litellm_nvidia_api_key"
append_var "OLLAMA_API_KEY" "litellm_ollama_api_key"
append_var "GEMINI_API_KEY" "litellm_gemini_api_key"
append_var "OPENAI_API_KEY" "litellm_openai_api_key"
append_var "ANTHROPIC_API_KEY" "litellm_anthropic_api_key"
append_secret_var "litellm_deepseek_api_key" "${DEEPSEEK_API_KEY:-}"
append_secret_var "litellm_nvidia_api_key" "${NVIDIA_API_KEY:-}"
append_secret_var "litellm_ollama_api_key" "${OLLAMA_API_KEY:-}"
append_secret_var "litellm_gemini_api_key" "${GEMINI_API_KEY:-}"
append_secret_var "litellm_openai_api_key" "${OPENAI_API_KEY:-}"
append_secret_var "litellm_anthropic_api_key" "${ANTHROPIC_API_KEY:-}"
# 4. Resolve one auth token for the bridge and downstream service UIs/APIs.
UNIFIED_AUTH_TOKEN="$(resolve_unified_auth_token)"

View File

@ -0,0 +1,100 @@
#!/usr/bin/env bash
set -euo pipefail
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
OUT_DIR="${XWORKSPACE_MCP_OUT_DIR:-$ROOT_DIR/config/mcp}"
BIN_DIR="${OUT_DIR}/bin"
ENV_FILE="${OUT_DIR}/local-mcp.env"
PROFILE_FILE="${OUT_DIR}/local-mcp-config.json"
mkdir -p "$OUT_DIR" "$BIN_DIR"
have() { command -v "$1" >/dev/null 2>&1; }
need_cmd() {
local cmd="$1"
if ! have "$cmd"; then
printf '[ERROR] missing required command: %s\n' "$cmd" >&2
exit 1
fi
}
need_cmd docker
if [ -n "${GITHUB_PERSONAL_ACCESS_TOKEN:-}" ]; then
umask 077
cat >"$ENV_FILE" <<EOF
GITHUB_PERSONAL_ACCESS_TOKEN=${GITHUB_PERSONAL_ACCESS_TOKEN}
EOF
elif [ ! -f "$ENV_FILE" ]; then
cat <<EOF >&2
[ERROR] GITHUB_PERSONAL_ACCESS_TOKEN is required.
Set it in your shell once, or create:
${ENV_FILE}
EOF
exit 1
fi
cat >"${BIN_DIR}/github-mcp-server.sh" <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
. "${ROOT_DIR}/config/mcp/local-mcp.env"
exec docker run --rm -i \
-e GITHUB_PERSONAL_ACCESS_TOKEN \
-e GITHUB_TOOLSETS=default,actions \
ghcr.io/github/github-mcp-server:latest
EOF
cat >"${BIN_DIR}/terraform-mcp-server.sh" <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
exec docker run --rm -i \
ghcr.io/hashicorp/terraform-mcp-server:latest \
--toolsets=registry
EOF
cat >"${BIN_DIR}/mcp-ssh-manager.sh" <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
exec npx -y mcp-ssh-manager@latest
EOF
chmod +x "${BIN_DIR}/github-mcp-server.sh" "${BIN_DIR}/terraform-mcp-server.sh" "${BIN_DIR}/mcp-ssh-manager.sh"
if ! have ansible-galaxy; then
printf '[WARN] ansible-galaxy not found; skipping ansible.mcp collection install.\n' >&2
else
printf '[INFO] installing ansible.mcp collection...\n' >&2
ansible-galaxy collection install ansible.mcp ansible.utils >/dev/null
fi
cat >"$PROFILE_FILE" <<JSON
{
"mcpServers": {
"github": {
"command": "${BIN_DIR}/github-mcp-server.sh"
},
"terraform": {
"command": "${BIN_DIR}/terraform-mcp-server.sh"
},
"ssh-manager": {
"command": "${BIN_DIR}/mcp-ssh-manager.sh"
}
}
}
JSON
cat <<EOF
[SUCCESS] wrote ${ENV_FILE}
[SUCCESS] wrote ${PROFILE_FILE}
Point your MCP client at:
${PROFILE_FILE}
Notes:
- The GitHub wrapper loads the token from ${ENV_FILE}, so you only need to run this once.
- Terraform MCP stays on the minimal registry toolset.
- SSH Manager runs through npx to avoid a global install.
- ansible.mcp is a collection dependency, not a standalone MCP daemon.
EOF

View File

@ -217,6 +217,60 @@ test_linux_identity_vars_can_be_overridden() (
printf '%s\n' "${ANSIBLE_EXTRA_VARS[@]}" | grep -q '^xworkspace_console_repo_dir=/srv/deploy/xworkspace-console$' || fail "console repo extra var missing"
)
test_provider_api_keys_use_secret_logging() {
local env_name ansible_var
while read -r env_name ansible_var; do
grep -Fq "append_secret_var \"$ansible_var\" \"\${$env_name:-}\"" "$BOOTSTRAP" ||
fail "$env_name is not passed through the masked secret logger"
if grep -Fq "append_var \"$env_name\"" "$BOOTSTRAP"; then
fail "$env_name is still passed through the plain-text parameter logger"
fi
done <<'EOF'
DEEPSEEK_API_KEY litellm_deepseek_api_key
NVIDIA_API_KEY litellm_nvidia_api_key
OLLAMA_API_KEY litellm_ollama_api_key
GEMINI_API_KEY litellm_gemini_api_key
OPENAI_API_KEY litellm_openai_api_key
ANTHROPIC_API_KEY litellm_anthropic_api_key
EOF
}
test_macos_plugin_patch_uses_stable_directory() {
local patcher="$SCRIPT_DIR/../scripts/patch-macos-playbooks.py"
grep -Fq 'Remove legacy temporary plugin symlink (macOS)' "$patcher" ||
fail "macOS plugin patch does not migrate the legacy temporary symlink"
grep -Fq 'Ensure stable openclaw-multi-session-plugins directory (macOS)' "$patcher" ||
fail "macOS plugin patch does not create a stable extension directory"
grep -Fq 'Copy built openclaw-multi-session-plugins into stable directory (macOS)' "$patcher" ||
fail "macOS plugin patch does not copy the built plugin into stable storage"
grep -Fq 'Record stable openclaw-multi-session-plugins install (macOS)' "$patcher" ||
fail "macOS plugin patch does not record stable OpenClaw provenance"
if grep -Fq 'Link openclaw-multi-session-plugins to extensions (macOS)' "$patcher"; then
fail "macOS plugin patch still installs the extension as a temporary symlink"
fi
}
test_local_bootstrap_prefers_local_macos_patcher() {
local checkout workdir
checkout="$(mktemp -d)"
workdir="$(mktemp -d)"
mkdir -p "$checkout/scripts"
printf 'local-patcher-marker\n' > "$checkout/scripts/patch-macos-playbooks.py"
(
XWORKSPACE_CONSOLE_DIR="$checkout"
cd "$workdir"
# shellcheck disable=SC2329
python3() {
grep -q '^local-patcher-marker$' "$1" ||
fail "patch function did not execute the checked-in patcher"
}
patch_playbooks_for_macos
)
rm -rf "$checkout" "$workdir"
grep -Fq '"${raw_url}?rev=$(date +%s)"' "$BOOTSTRAP" ||
fail "remote macOS patcher download is not cache-busted"
}
test_root_does_not_require_sudo
printf 'ok - root execution does not require sudo\n'
test_non_root_uses_sudo
@ -246,3 +300,9 @@ test_linux_non_root_uses_current_user_home
printf 'ok - Linux non-root deployment uses passwd home\n'
test_linux_identity_vars_can_be_overridden
printf 'ok - Linux deployment identity can be overridden\n'
test_provider_api_keys_use_secret_logging
printf 'ok - provider API keys use masked secret logging\n'
test_macos_plugin_patch_uses_stable_directory
printf 'ok - macOS plugin patch uses stable extension storage\n'
test_local_bootstrap_prefers_local_macos_patcher
printf 'ok - local bootstrap prefers the checked-in macOS patcher\n'