Commit Graph

449 Commits

Author SHA1 Message Date
Haitao Pan
c3f3b8ac8e refactor(agent_skills): run on target host, git-clone sources, drop delegate_to localhost
Make the role work identically under both execution models:
- local/pull (curl|bash -> ansible-playbook -c local; localhost == host)
- remote controller (ansible-playbook -i inventory over ssh; tasks run on host)

Changes:
- Remove ALL delegate_to: localhost (the old raw 'command: rsync' detected
  local-vs-remote via ansible_connection, but delegate_to localhost forced it
  to 'local', so the user@host push branch was dead code -> remote runs wrote
  to the controller's /root and failed).
- Acquire xworkspace-core-skills via ansible.builtin.git clone ON THE HOST
  (most universal/cross-platform), instead of requiring a controller-side dir.
- Merge core skills into the canonical dir with ansible.builtin.copy
  (remote_src, host-local) instead of raw rsync; installer adapters install
  directly into the canonical dir on the host.
- Drop rsync-only vars/excludes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 14:57:49 +08:00
Haitao Pan
2ef144d572 fix(console): serve dashboard/dist via local python http.server (not npm/caddy)
Prebuilt runtime ships only dashboard/dist (no package.json) so npm run
preview ENOENT-crash-loops (254). console is a local-only static backend on
127.0.0.1:17000 (dashboard is a routerless SPA); serve it with python3
-m http.server on both Linux (console.service) and macOS (console.plist) —
no second caddy (avoids clashing with the system caddy on :80; console is
local-only and not proxied by default). Gate the apt caddy install on
caddy_enabled (true on public-IP Linux VPS for the bridge ingress; macOS
installs no caddy).

Verified: debian13 + ubuntu26.04 console.service active serving 17000=200;
macOS python3 serves the same dist locally.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 09:44:01 +08:00
Haitao Pan
3505ff1c31 fix(ai-workspace): deploy robustness on Debian13/Ubuntu26.04 (py3.13)
- setup-xworkspace-console.yaml:
  - xworkspace_console_user follows ansible_env.USER (was hardcoded ubuntu;
    mismatched home=/root on root connections -> systemd link 'src does not exist')
  - runtime apt task async/poll (xfce4 desktop install dropped the SSH session)
  - api_dir -> bin/ to match prebuilt runtime manifest (apiBinary: bin/xworkspace-api;
    was api/ -> 203/EXEC crash loop)
- roles/ai_agent_runtime/tasks/{main,docs,fonts,browser}.yml: apt lock_timeout
  (texlive/pandoc raced cloud-init/unattended-upgrades for the dpkg lock)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 03:02:43 +08:00
Haitao Pan
a5e19eff60 chore: qmd version bump, macOS container runtime deps, ignore inventory pycache
- roles/vhosts/common: add docker/docker-compose/colima to macOS brew deps
  (headless container runtime for qmd PG memory-bridge tests)
- roles/vhosts/qmd: bump qmd_version
- .gitignore: ignore inventory/__pycache__/

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 21:01:57 +08:00
Haitao Pan
df48cb4f5a feat(inventory): add Terraform CMDB dynamic inventory for ai-workspace
Reads cmdb.json produced by iac_modules vultr-vps/envs/ai-workspace
generate.py and exposes hosts/groups/hostvars to Ansible, linking IaC
provisioning to playbook deploys (terraform_cmdb.py).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:57:58 +08:00
Haitao Pan
099a144a9e fix(xworkmate-bridge): define missing xworkmate_bridge_caddy_base_dir
xworkmate_bridge_obsolete_caddy_fragment_paths references
xworkmate_bridge_caddy_base_dir, but the var was never defined, so the
'Inspect deprecated ACP Caddy fragment' task aborted with
'xworkmate_bridge_caddy_base_dir is undefined'. Define it from the global
caddy_config_dir (consistent with the role's other caddy paths), which is
already OS-aware (/etc/caddy on Linux, Homebrew prefix on macOS).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 14:42:20 +08:00
Haitao Pan
f5a5979439 fix(acp-gemini): create runtime dirs so service WorkingDirectory exists
acp-gemini.service sets WorkingDirectory={{ acp_gemini_workdir }} (~/.gemini)
but the role never created it, so systemd failed at step CHDIR (status
200/CHDIR), the adapter never bound 127.0.0.1:8791, and the CORS preflight
validation failed after 30 retries. Mirror the opencode role: pre-create the
home, .gemini workdir, XDG config and state dirs owned by the service user.
Linux/Debian only (guarded != Darwin); macOS uses the launchd path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 14:31:38 +08:00
Haitao Pan
e5fc29fa8a fix(console): download runtime from deterministic latest-runtime tag
The online runtime download used releases/latest/download, which GitHub
resolves to whichever release holds the 'Latest' flag. The console repo also
publishes offline-ai-workspace-* build releases that take that flag and carry
no console runtime asset -> HTTP 404 on the online/Debian path. Point at the
stable latest-runtime release (published by the console-runtime workflow) and
add a bounded download retry. The env-provided archive path still wins via the
existing when-guard, so offline/bundled installs are unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 14:19:03 +08:00
Haitao Pan
9e81f65a62 fix(openclaw): pull multi-session plugin runtime from deterministic runtime-latest asset
The download used releases/latest/download, which GitHub resolves to the
human-facing v0.1.12 tag (no runtime asset) -> HTTP 404, failing the deploy
on Ubuntu 26.04 (and any platform). Point at the stable runtime-latest
release published by the plugin repo's runtime-release workflow, and add a
bounded retry around the download for transient network errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 14:03:23 +08:00
e0bfc765bf feat(litellm): make model registration idempotent via fallback to /model/update 2026-06-23 13:43:42 +08:00
4e183d2d44 fix(litellm): resolve os.environ variables locally before registering models to DB 2026-06-23 13:27:09 +08:00
28df3b59d6 feat(openclaw): conditionally render default UI models and providers based on active API keys 2026-06-23 13:09:56 +08:00
a0d59c0af1 feat(openclaw): adopt native provider simulation pointing to litellm gateway 2026-06-23 12:42:04 +08:00
25b8204b7b fix(openclaw): use hyphens for litellm models to prevent provider intercept 2026-06-23 12:21:45 +08:00
6e260a3425 feat(litellm): ensure deepseek-chat and deepseek-reasoner are registered 2026-06-23 12:18:24 +08:00
e7c96675ff feat(litellm): update model registrations and gateway configurations with API key gating 2026-06-23 11:04:21 +08:00
Haitao Pan
01f1499a60 feat(ai-workspace): consume prebuilt console runtime for final deployment
The macOS console API previously ran via `go run .`, which fails under
launchd's minimal PATH (no `go`) and recompiles on every launch. Switch to
the same prebuilt-runtime consumption model the bridge/qmd/litellm runtimes
already use.

The ai-workspace role now does final deployment only (never builds):
- download xworkspace-console-runtime-<os>-<arch>.tar.gz (incl. darwin-arm64)
  from the latest-runtime release, or use an offline-staged archive via
  XWORKSPACE_CONSOLE_RUNTIME_ARCHIVE;
- unpack to a per-user system dir (~/.local/share/xworkspace-console),
  idempotent via a sha256 marker;
- read manifest.json to resolve the prebuilt API binary and assert it is a
  present, executable native binary;
- on macOS, deploy a LaunchAgent that sources portal.env and execs the
  prebuilt binary directly — no go, no Homebrew, no PATH games.

The Go API is pure-Go (no cgo), so CI cross-compiles darwin-arm64 cleanly;
this role only consumes that artifact. Validated end-to-end on darwin-arm64:
packaged binary serves :8788 (200 with token, 401 without) under launchd.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 17:04:55 +08:00
Haitao Pan
a5850cfcee fix(acp_server_gemini): revert incompatible adapter command syntax and update args for antigravity-cli 2026-06-22 13:59:52 +08:00
Haitao Pan
2a85be5c9b fix(xworkmate_bridge): remove obsolete IMAGE variable causing undefined errors 2026-06-22 13:55:14 +08:00
Haitao Pan
32e00a8617 fix(litellm,validation): refine model registration and add cross-platform service validation 2026-06-22 13:52:05 +08:00
Haitao Pan
0ac424f00e Merge branch 'xworkspace-portal-dashboard-17000'
# Conflicts:
#	setup-xworkspace-console.yaml
2026-06-22 13:27:37 +08:00
Haitao Pan
1b2aea005a Merge branch 'refactor/upgrade-antigravity-cli'
# Conflicts:
#	roles/vhosts/acp_server_gemini/defaults/main.yml
#	roles/vhosts/acp_server_gemini/templates/gemini.plist.j2
2026-06-22 13:26:30 +08:00
Haitao Pan
93a3067ea4 Merge branch 'codex/openclaw-playbook-concurrency'
# Conflicts:
#	roles/vhosts/gateway_openclaw/templates/openclaw.json.j2
#	roles/vhosts/xworkmate_bridge/defaults/main.yml
2026-06-22 13:25:45 +08:00
Haitao Pan
9926a46f76 fix(litellm): percent-encode DB password in DATABASE_URL
LiteLLM crash-looped on macOS with Prisma `P1013: invalid port number in
database URL`. The shared auth token is generated by `openssl rand -base64`
and can contain '/', '+' or '='; injected raw into the DATABASE_URL
userinfo, a '/' truncates the authority so the port parses as invalid and
proxy startup fails (port 4000 never binds).

Percent-encode the password for the DATABASE_URL only, via an explicit
reserved-set replace chain ('%' first to avoid double-encoding) since
Jinja's urlencode leaves '/' unescaped. The DB user password stays raw in
provision-database and LITELLM_DB_PASSWORD, and the URL form decodes back
to the identical secret (verified round-trip), so authentication is
unchanged. No effect when no DB host is configured.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:56:56 +08:00
Haitao Pan
ef67c61cf7 fix(xfce): skip Linux XFCE/XRDP desktop stack on macOS
The all-in-one flow reached "Update apt cache" in the
xfce_desktop_minimal_runtime role on macOS and failed with
`[Errno 2] No such file or directory: b'update'` (no apt on Darwin).

XFCE + XRDP is a Linux remote-desktop stack and is meaningless on macOS,
which already has a native GUI. Guard both role includes in
setup-xfce-xrdp.yaml with `ansible_os_family != 'Darwin'` so the apt/systemd
tasks never run there. Linux behavior is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:46:31 +08:00
Haitao Pan
6091b9dbcf fix(qmd): pin Homebrew node@24 for build and status on macOS
`qmd status` aborted with ERR_DLOPEN_FAILED — better-sqlite3 was compiled
against NODE_MODULE_VERSION 137 (node@24) but the validate-status task ran
under nvm's Node 20 (NODE_MODULE_VERSION 115), because the user's PATH puts
nvm node ahead of Homebrew and the task pinned no PATH.

Pin `/opt/homebrew/bin` (node@24) ahead of nvm on Darwin for the npm
install, npm build, and validate-status tasks so the native module is
built and loaded against one consistent Node ABI — the same node@24 the
launchd plist already uses. Linux PATH is left unchanged via an
ansible_os_family conditional.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:43:05 +08:00
Haitao Pan
d9033960fd fix(qmd): drop undefined nodejs_version from macOS LaunchAgent PATH
The QMD launchd plist hardcoded an NVM node path
(`~/.nvm/versions/node/{{ nodejs_version }}/bin`), but `nodejs_version` is
never defined in the Homebrew-based macOS deploy, so "Deploy QMD
LaunchAgent" aborted with `AnsibleUndefinedVariable: 'nodejs_version' is
undefined`.

QMD is a bun binary and the Linux user unit already uses
`.bun/bin:.local/bin:...`. Mirror that for the plist PATH and add the
Homebrew prefix (`/opt/homebrew/bin`) for the brew-installed node@24,
removing the nvm/nodejs_version dependency entirely (same remedy as the
console plist in TC-MAC-005).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:32:58 +08:00
Haitao Pan
bbf5260f0d fix(litellm): put venv bin on PATH for prisma generate on macOS
`prisma generate` invokes the `prisma-client-py` generator as a `/bin/sh`
subprocess, which is resolved via PATH. Even though the role calls the
absolute venv `prisma` binary, the generator console script lives in the
same venv bin dir that is not on the default command PATH, so generation
failed with "prisma-client-py: command not found" on macOS.

Add an `environment.PATH` that prepends the venv bin dir (plus Homebrew
prefixes) so the generator subprocess resolves.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:17:12 +08:00
Haitao Pan
ce2070e779 fix(litellm): repair macOS dependency version probe one-liner
The "Inspect installed LiteLLM dependency versions" probe was written as a
multi-line Python program under YAML `>-` folding, which collapses every
newline into a space. The resulting single logical line contained a
`for ... : try: ... except:` block, which is a SyntaxError. With
`failed_when: false` the failure was swallowed, leaving stdout empty, and
the subsequent `set_fact` crashed in `from_json('')` with
"Expecting value: line 1 column 1 (char 0)".

Rewrite the probe as a genuinely single-line program (dict/list
comprehensions over importlib.metadata.distributions(), joined by `;`),
and harden the decision `set_fact` with `default('{}', true)` so an empty
or malformed probe degrades to "install required" instead of aborting the
play.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:16:57 +08:00
f4a30b9e01 fix(litellm): resilient online dependency install
litellm[proxy] pulls large wheels (polars-runtime ~46MB) that break mid-stream over slow/mirrored links with IncompleteRead, failing the deploy. Add pip --retries/--resume-retries (resumes partial downloads) + longer timeout, tunable via litellm_pip_* vars, and upgrade pip in the venv first so --resume-retries (pip>=25.1) exists.
2026-06-22 02:42:51 +00:00
Haitao Pan
6a2f05f435 fix(litellm): skip redundant dependency installs 2026-06-21 22:34:34 +08:00
Haitao Pan
71ebe6444c fix(litellm): isolate runtime in Python 3.13 venv 2026-06-21 21:15:21 +08:00
Haitao Pan
c11f51b4c9 fix(openclaw): allow version-matched acpx plugin 2026-06-21 21:07:21 +08:00
Haitao Pan
f01e0bb15b fix(qmd): provision macOS LaunchAgent 2026-06-21 21:05:59 +08:00
Haitao Pan
09a39e69ee perf(openclaw): avoid unnecessary doctor repairs 2026-06-21 20:54:01 +08:00
Haitao Pan
02667f9e76 Merge remote-tracking branch 'origin/main' 2026-06-21 20:41:41 +08:00
Haitao Pan
65e45a4834 fix(vhosts): make macOS defaults and vault tasks platform aware 2026-06-21 20:41:32 +08:00
Haitao Pan
f231867593 Merge branch 'fix/xworkmate-windows-handler' into HEAD 2026-06-21 20:40:03 +08:00
Haitao Pan
1dd0ce2e04 fix(xworkmate_bridge): use correct Windows command module 2026-06-21 20:38:14 +08:00
Haitao Pan
9f04d4d9b5 perf(nodejs): configure USTC Homebrew mirrors to accelerate installation 2026-06-21 20:20:53 +08:00
Haitao Pan
4f87b67a4e feat(xworkmate_bridge): add Windows Scheduled Task deployment and skip Caddy on Windows 2026-06-21 20:18:11 +08:00
Haitao Pan
51d08cf9db perf(nodejs): disable homebrew auto update to speed up installation 2026-06-21 20:14:03 +08:00
Haitao Pan
aa3b4e8069 fix(gateway_openclaw): resolve npm path resolution, remove obsolete plugins via CLI, and make doctor non-interactive 2026-06-21 20:10:50 +08:00
Haitao Pan
85bad4155f fix(gateway_openclaw): resolve parse error on windows module in handlers 2026-06-21 19:57:37 +08:00
Haitao Pan
48ba854671 fix(litellm): use explicit postgres database for psql commands to prevent connection errors on macOS 2026-06-21 19:56:44 +08:00
Haitao Pan
fa04606542 feat(gateway_openclaw): run doctor --fix --force on restart 2026-06-21 19:54:58 +08:00
Haitao Pan
aedf457ddc feat(gateway_openclaw): add Windows tasks implementation 2026-06-21 19:53:26 +08:00
Haitao Pan
5f7bc697fc fix(playbooks): use include_tasks for windows and force node24 path for openclaw 2026-06-21 19:52:14 +08:00
Haitao Pan
284c3c43a3 feat(ai_agent_runtime): add Windows tasks implementation 2026-06-21 19:48:52 +08:00
Haitao Pan
340de0c4d8 feat(nodejs): add Windows tasks implementation 2026-06-21 19:43:16 +08:00