- Stop checking out the old private mono-repo `ai-workspace-infra`.
- Checkout the split public repositories `ai-workspace-infra/playbooks` and `ai-workspace-infra/iac_modules` separately.
- Remove `CODEX_GITHUB_PERSONAL_ACCESS_TOKEN` (`INFRA_REPO_TOKEN`) dependency from vault as it's no longer needed for public repos.
Documents the YAML->generate.py->terraform->cmdb.json->ansible flow, the FQDN
inventory_hostname contract, the two execution models, the Vault-OIDC pipeline,
the non-empty/fail-fast checks, and the key fixes that make it work end to end.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a 'Validate required secrets' run-step after each job's Vault OIDC
load step. It checks REQUIRED steps.vault.outputs.* are non-empty via
env: mapping (never echoes secret values), and on any empty key prints a
::error:: naming the key + its Vault path then exit 1. The deploy job
requires at least one of ANSIBLE_SSH_KEY_B64 / ANSIBLE_SSH_KEY. Optional
keys (INFRA_REPO_TOKEN, TF_STATE_*) are not validated. Vault path strings
in error messages reference the env.VAULT_KV[_OPENCLAW] vars rather than
hardcoded literals.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
On-host ansible-playbook -c local now uses XWORKMATE_BRIDGE_DOMAIN (sourced from
CMDB service_domains via the pipeline) or the host FQDN as inventory_hostname,
falling back to 127.0.0.1 only when no valid FQDN exists. Keeps -c local.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New optional 'bridge_domain' input overrides; otherwise derive from each host's
cmdb.json host_vars.service_domains (first entry) and inject as
XWORKMATE_BRIDGE_DOMAIN so the host sets /etc/hostname + xworkmate-bridge.caddy
from it (on-host model has no inventory hostvars).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
DEEPSEEK/NVIDIA/OLLAMA_API_KEY live in kv/data/openclaw (not CICD); vault-action
reads them from that path in the same step. Policy grants read on both
kv/data/CICD and kv/data/openclaw.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- VAULT_KV -> kv/data/CICD (shared CICD secrets), map existing keys to outputs:
CODEX_GITHUB_PERSONAL_ACCESS_TOKEN->INFRA_REPO_TOKEN,
SSH_PRIVATE_DEPLOY_KEY[_B64]->ANSIBLE_SSH_KEY[_B64],
CLOUDFLARE_DNS_API_TOKEN direct; VULTR_API_KEY/LLM keys same name.
- docs: policy reads kv/data/CICD; field table maps existing keys; note the
three LLM keys still need to be added to kv/CICD, and SSH_PUBLIC_DEPLOY_KEY
must match hosts.yaml.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- deploy job: read ANSIBLE_SSH_KEY_B64 (preferred) + ANSIBLE_SSH_KEY (fallback)
from Vault, decode/write ~/.ssh/id_deploy and ssh-keygen -y self-check —
matches the org SSH-deploy runbook (avoids multiline-key libcrypto errors).
- docs/operations/vault-github-actions.md: full Vault role/policy/jwt/KV setup
for github-actions-xworkspace-console, mirroring the existing org records.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace GitHub Actions Secrets with HashiCorp Vault (https://vault.svc.plus):
- permissions: id-token: write; auth via hashicorp/vault-action@v2 (method=jwt,
role=github-actions-xworkspace-console, audience=vault) — no static token.
- Each job loads only the keys it needs from kv/data/github-actions/xworkspace-console
(VULTR_API_KEY, INFRA_REPO_TOKEN, ANSIBLE_SSH_KEY, CLOUDFLARE_API_TOKEN,
DEEPSEEK/NVIDIA/OLLAMA_API_KEY, optional TF_STATE_*).
- Backend gating now keys off the Vault output (steps.vault.outputs.TF_STATE_BUCKET).
- Drop unused 'playbook' input (deploy is on-host bootstrap).
Pattern mirrors xworkmate-app/.github/workflows/build-and-release.yml.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Expand the all-in-one setup guide (zh+en) into a full reference of the
bootstrap script's supported options, grouped by purpose: subcommands
(uninstall/--purge), public-exposure & security, unified auth-token chain,
runtime modes, offline package, performance/locks, source/version overrides.
Fix the inaccurate TOKEN var -> AI_WORKSPACE_AUTH_TOKEN (the real precedence
chain). Sourced from scripts/setup-ai-workspace-all-in-one.sh.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- deploy-ai-workspace-iac.yaml: deploy job now ssh-es to each host and runs
the official curl|bash bootstrap locally (host-side ansible -c local,
offline-accelerated), instead of running all-in-one from the runner (which
breaks on roles/agent_skills delegate_to: localhost). provision job kept as
the batch-provision mode.
- docs/operations: record final console fix (local python static backend),
caddy/public-access architecture, and debian13/ubuntu26.04/macOS verification.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Records the IaC->inventory->deploy linkage, offline-package linkage
verification, the local-on-host execution finding, the 5 fixes applied to
playbooks, and the remaining console static-serve + pipeline TODOs.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
generate.py moved to vultr-vps/scripts/ and provider/variables/cloud-init to
templates/; run render/inventory from VPS_ROOT via scripts/generate.py, keep
terraform -chdir in the env workdir.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Matrix pipeline that provisions Vultr hosts via iac_modules vultr-vps
ai-workspace env (Terraform), derives the deploy matrix from the rendered
CMDB, deploys per-host with Ansible all-in-one, then syncs Cloudflare DNS.
Pipelining off + PYTHONWARNINGS=ignore for Python 3.13 targets.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The ai-workspace role's final-deployment step downloads the console runtime
from a stable latest-runtime release (matching the bridge/qmd/litellm
convention). Have the publish job refresh a moving `latest-runtime` release
alongside the immutable `runtime-<sha>` one, carrying the same cross-compiled
assets (darwin-arm64, linux-amd64, linux-arm64) + SHA256SUMS, so consumers get
a predictable URL:
releases/download/latest-runtime/xworkspace-console-runtime-<os>-<arch>.tar.gz
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Inject pip --retries/--resume-retries into the cloned litellm install task,
tolerate empty version-probe stdout via default('{}', true), and guard the
OpenClaw download/extract patches so a second pass cannot append a duplicate
`when:` (invalid YAML). Ignore scripts/__pycache__.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Document the six macOS issues found and fixed during end-to-end
verification of the all-in-one install: litellm dependency version-probe
SyntaxError (TC-028), prisma generator PATH (TC-029), QMD plist undefined
nodejs_version (TC-030), QMD better-sqlite3 Node ABI mismatch (TC-031),
XFCE/XRDP apt-on-macOS (TC-032), and litellm DATABASE_URL password
percent-encoding / P1013 (TC-033), each with its playbooks commit. Update
the fix-dimension summary and the runtime delivery plan status.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`uninstall` / `uninstall --purge` previously removed services and (on
purge) `rm -rf`'d a hand-maintained list of paths with no output, so users
could not see what would be — or had been — deleted (TC-MAC-026).
Add a pre-flight `print_uninstall_summary` that lists the apps/services to
be removed (launchd agents on macOS; systemd units + docker containers on
Linux) and, when --purge is set, every target path with its current
[present]/[absent] status. Centralize the purge paths into a single
source-of-truth inventory and route deletions through a `purge_path`
helper that prints `removed:` / `absent (skipped):` per path. Document the
subcommands in the usage header. Behavior is otherwise unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Mirror the playbooks fix for the curl|bash clone path: rewrite
default('{}') | from_json to default('{}', true) | from_json so an empty
probe stdout no longer crashes the install-decision set_fact.
- OpenClaw download sub-patch re-applied on a 2nd pass produced a duplicate when: key (invalid YAML); guard against the already-patched form.
- OpenClaw extract_old drifted from upstream (missing creates:, notify now a list) so the Darwin guard silently never applied and the task tried to unarchive a tarball never downloaded on macOS; realign + idempotent guard.
- Inject pip --retries/--resume-retries + longer timeout into the LiteLLM dependency install for the curl|bash clone path.
Build the dashboard dist once (platform-independent) and cross-compile the Go
API (CGO disabled) for darwin-arm64, linux-amd64, linux-arm64 in a single
ubuntu job. Drops the macOS runners and per-arch docker. node_modules is
excluded from the runtime (nodejs role provides Node on target), so the
tarballs only carry the API binary + built dist + scripts + manifest. macOS is
arm64-only; Linux covers amd64/arm64 (Debian and Ubuntu share the binary).