ci+docs: on-host bootstrap deploy job + console serving/verification updates
- deploy-ai-workspace-iac.yaml: deploy job now ssh-es to each host and runs the official curl|bash bootstrap locally (host-side ansible -c local, offline-accelerated), instead of running all-in-one from the runner (which breaks on roles/agent_skills delegate_to: localhost). provision job kept as the batch-provision mode. - docs/operations: record final console fix (local python static backend), caddy/public-access architecture, and debian13/ubuntu26.04/macOS verification. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
e47b15a5f0
commit
b2c8c5d875
58
.github/workflows/deploy-ai-workspace-iac.yaml
vendored
58
.github/workflows/deploy-ai-workspace-iac.yaml
vendored
@ -3,10 +3,13 @@ name: Deploy AI Workspace (IaC + Ansible + Cloudflare)
|
||||
# =============================================================================
|
||||
# IaC ↔ Ansible 动态 inventory 联动的最终部署流水线(矩阵模式)
|
||||
#
|
||||
# provision : 用 vultr-vps/envs/ai-workspace 创建主机(Python+Jinja2 渲染显式
|
||||
# provision : 批量起机模式(开关:terraform_action=apply / run_deploy)。
|
||||
# 用 vultr-vps/envs/ai-workspace 创建主机(Python+Jinja2 渲染显式
|
||||
# HCL,无 for_each),导出 cmdb.json + inventory.ini,并据此动态
|
||||
# 生成下游部署矩阵。
|
||||
# deploy : 矩阵按主机并行,用 Ansible all-in-one playbook 部署 AI Workspace。
|
||||
# deploy : 矩阵按主机并行,ssh 到主机本地跑官方引导(curl|bash → host 内部
|
||||
# ansible -c local,自动离线包加速)。与用户 self-host 同一路径;
|
||||
# 不在 runner 远程跑 all-in-one(会撞 agent_skills delegate_to localhost)。
|
||||
# dns : 部署完成后,依据 inventory 的 service_domains/IP 同步 Cloudflare DNS。
|
||||
#
|
||||
# 数据契约 cmdb.json 由 ai-workspace-infra 的 generate.py 产出,贯穿三个 job。
|
||||
@ -172,7 +175,7 @@ jobs:
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
deploy:
|
||||
name: Deploy ${{ matrix.host }}
|
||||
name: Deploy ${{ matrix.host }} (on-host bootstrap)
|
||||
needs: provision
|
||||
if: ${{ needs.provision.outputs.count != '0' && (github.event.inputs.run_deploy == 'true' || github.event.inputs.run_deploy == null) }}
|
||||
runs-on: ubuntu-latest
|
||||
@ -181,27 +184,16 @@ jobs:
|
||||
matrix:
|
||||
host: ${{ fromJSON(needs.provision.outputs.hosts) }}
|
||||
steps:
|
||||
- name: Checkout infra (playbooks)
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
repository: ${{ env.INFRA_REPO }}
|
||||
ref: ${{ github.event.inputs.infra_ref || 'main' }}
|
||||
token: ${{ secrets.INFRA_REPO_TOKEN || github.token }}
|
||||
path: infra
|
||||
|
||||
- name: Download CMDB + inventory
|
||||
# all-in-one 是“在目标主机本地执行”的模型(host 内部 ansible-playbook -c local,
|
||||
# 自动走离线包加速)。从 runner 远程跑 all-in-one 会撞 roles/agent_skills 的
|
||||
# delegate_to: localhost(写 runner 本地 /root),故 deploy 改为 ssh 到主机本地
|
||||
# 跑官方引导脚本——与用户 self-host 的 curl|bash 完全同一路径。
|
||||
- name: Download CMDB (host IP source)
|
||||
uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: ai-workspace-cmdb
|
||||
path: cmdb
|
||||
|
||||
- uses: actions/setup-python@v5
|
||||
with:
|
||||
python-version: "3.12"
|
||||
|
||||
- name: Install Ansible
|
||||
run: pip install --quiet ansible
|
||||
|
||||
- name: Configure SSH
|
||||
run: |
|
||||
set -euo pipefail
|
||||
@ -220,27 +212,25 @@ jobs:
|
||||
done
|
||||
echo "::error::Timed out waiting for ${ip}:22"; exit 1
|
||||
|
||||
- name: Ansible deploy (${{ github.event.inputs.playbook || 'setup-ai-workspace-all-in-one.yml' }})
|
||||
working-directory: ${{ env.PLAYBOOKS_DIR }}
|
||||
- name: Run on-host bootstrap (curl | bash, local-mode install)
|
||||
env:
|
||||
ANSIBLE_HOST_KEY_CHECKING: "False"
|
||||
# Python 3.13 目标(Debian 13 / Ubuntu 26.04)下,ansible apt 模块会抛
|
||||
# DeprecationWarning,pipelining 模式会让该 stderr 污染模块返回 → UNREACHABLE。
|
||||
# 关 pipelining 分离 stderr,并静默告警。
|
||||
ANSIBLE_PIPELINING: "False"
|
||||
PYTHONWARNINGS: "ignore"
|
||||
DEEPSEEK_API_KEY: ${{ secrets.DEEPSEEK_API_KEY }}
|
||||
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
|
||||
OLLAMA_API_KEY: ${{ secrets.OLLAMA_API_KEY }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
# -e 覆盖私钥:playbooks/group_vars/all.yml 把 ansible_ssh_private_key_file
|
||||
# 固定成 id_rsa,会盖掉 --private-key;extra-vars 优先级最高。
|
||||
ansible-playbook \
|
||||
-i "${GITHUB_WORKSPACE}/cmdb/inventory.ini" \
|
||||
--limit "${{ matrix.host }}" \
|
||||
-e "ansible_ssh_private_key_file=${HOME}/.ssh/id_ed25519" \
|
||||
"${{ github.event.inputs.playbook || 'setup-ai-workspace-all-in-one.yml' }}"
|
||||
ip="$(jq -r '.["${{ matrix.host }}"].ip' cmdb/cmdb.json)"
|
||||
user="$(jq -r '.["${{ matrix.host }}"].ansible_user // "root"' cmdb/cmdb.json)"
|
||||
echo "Bootstrapping ${{ matrix.host }} (${user}@${ip}) on-host ..."
|
||||
ssh -i ~/.ssh/id_ed25519 \
|
||||
-o StrictHostKeyChecking=accept-new \
|
||||
-o ServerAliveInterval=20 -o ServerAliveCountMax=15 \
|
||||
-o ConnectTimeout=20 \
|
||||
"${user}@${ip}" \
|
||||
"DEEPSEEK_API_KEY='${DEEPSEEK_API_KEY}' \
|
||||
NVIDIA_API_KEY='${NVIDIA_API_KEY}' \
|
||||
OLLAMA_API_KEY='${OLLAMA_API_KEY}' \
|
||||
bash -lc 'curl -sfL https://install.svc.plus/ai-workspace | bash -'"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
dns:
|
||||
|
||||
@ -47,14 +47,27 @@ config/resources/ai-workspace-hosts.yaml (IaC 声明, 唯一人工入口)
|
||||
| 3 | xfce4 大装包拖断 SSH 会话 | "Install runtime packages" UNREACHABLE(包其实装完) | 该 apt 任务加 `async/poll` |
|
||||
| 4 | ai_agent_runtime apt 抢 dpkg 锁 | texlive/pandoc `Could not get lock` | `roles/ai_agent_runtime/tasks/{main,docs,fonts,browser}.yml` 加 `lock_timeout` |
|
||||
| 5 | console API 二进制路径错 | `api/xworkspace-api` 不存在 → `203/EXEC` 崩溃重启 | manifest `apiBinary: bin/xworkspace-api`,`setup-xworkspace-console.yaml` 的 `api_dir` 改 `bin/` |
|
||||
| 6 | console 伺服方式错 | 预编译只发 `dashboard/dist`(无 package.json),`npm run preview` ENOENT(254)崩溃重启 | console 是 `127.0.0.1:17000` 上的**本地静态后端**(dashboard 为无路由单页),Linux `console.service` 与 macOS `console.plist` 统一用 `python3 -m http.server --directory dist`;**不**起第二个 caddy(避免与系统 caddy 抢 :80) |
|
||||
| 7 | caddy 安装未受开关控制 | console play 无条件 `apt install caddy` | apt 列表里 caddy 由 `caddy_enabled` 门控(VPS 默认 true;关→不装;macOS 无 apt 本就不装) |
|
||||
|
||||
部署侧加固(长途控制连接稳定性): `ANSIBLE_SSH_ARGS` 加 `ServerAliveInterval/ControlPersist`, `ANSIBLE_SSH_RETRIES`。
|
||||
|
||||
## 5. 待办(remaining)
|
||||
## 4b. 公网暴露 / caddy 架构
|
||||
|
||||
- **console.service 伺服方式**: 预编译 runtime 只发 `dashboard/dist`(无 package.json),
|
||||
而服务跑 `npm run preview` → ENOENT 崩溃重启。需改为**静态伺服 dist**(候选: api 二进制自带伺服 / caddy file_server / 静态服务器),属 app 设计决策,未擅改。
|
||||
- **deploy 流水线**: `deploy-ai-workspace-iac.yaml` 的 deploy job 由"runner 跑 all-in-one"改为"用 inventory 在主机上跑 curl|bash 引导"(契合本地执行模型 + 离线加速)。
|
||||
- **caddy 是统一反代前端**(:80/:443),每个**对外**服务一份 `/etc/caddy/conf.d/*.caddy`(`reverse_proxy` 到其本地端口)。
|
||||
- **Linux VPS(有公网 IP)**:默认仅 `XWORKMATE_BRIDGE_PUBLIC_ACCESS` 开(经 `-e` 传入)→ bridge 进 conf.d 对外;console(17000)等 `*_PUBLIC_ACCESS` 默认 false → **本地only、无 conf.d**。`caddy_enabled` 默认 true → 装 caddy。
|
||||
- **macOS 本机**:无需暴露任何公网服务、全内网、`caddy_enabled=false` → **不装 caddy**;console 同样 python 本地伺服。
|
||||
- 注意 `:80` 由 **apt caddy 包自带的默认 `/etc/caddy/Caddyfile`(`:80 {}`)** 占用;早期把 console 也做成第二个 caddy 会因 auto-HTTPS 预留 :80 而冲突——故 console 改为 python 静态后端、由系统 caddy 反代(本地only 时则不反代)。
|
||||
|
||||
## 5. 验证结果(已闭合)
|
||||
|
||||
| 平台 | console | api | 17000 |
|
||||
|------|---------|-----|-------|
|
||||
| debian13(caddy_enabled=true) | active(python3) | active | 200 `<title>XWorkspace Dashboard</title>` |
|
||||
| ubuntu26.04 | active(python3) | active | 200 |
|
||||
| macOS 本机(python3 伺服 dist) | — | — | 200 `<title>XWorkspace Dashboard</title>` |
|
||||
|
||||
- deploy 流水线: `deploy-ai-workspace-iac.yaml` 的 deploy job 已改为"ssh 到主机本地跑 curl|bash 引导"(契合本地执行模型 + 离线加速),provision job 保留为批量起机模式。
|
||||
|
||||
## 6. 离线/在线回退确认
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user