include_role passed `nodejs_version: "{{ ai_agent_runtime_nodejs_version |
default(nodejs_version) }}"` — a var named nodejs_version whose template
references nodejs_version itself. Ansible 2.19+'s lazy templating detects the
self-reference in the AST and fails the nodejs role's `nodejs_version_major`
set_fact with "Recursive loop detected: maximum recursion depth exceeded".
Use default(omit) so the nodejs role's own default applies when the
ai_agent_runtime override is absent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On re-run, "Ensure compose directories exist" reset the bind-mounted data dir
to root:root 0700. The official postgres image only chowns/initdb's an EMPTY
PGDATA, so a non-empty data dir stayed root-owned while the backend runs as uid
999 -> "could not open file global/pg_filenode.map: Permission denied" (pg_isready
still passes, masking it; ALTER USER / real queries fail).
Split the dir task: compose project dir stays root:root; data dir is created
owned by postgresql_container_uid/gid (default 999), idempotent across re-runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a host's Chrome apt repo already carries a newer build than a pinned version,
apt refuses with "Packages were downgraded and -y was used without
--allow-downgrades". Set allow_downgrade: true so an explicit (older-but-available)
pin installs cleanly. Complements the empty-default fix (e174e8b): default path
installs latest, pinned path now also robust.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Deploy failed on ubuntu26 with "no available installation candidate for
google-chrome-stable=149.0.7827.114-1": Google's apt repo only ever carries the
current stable, so any pinned build vanishes within weeks.
Two fixes:
- defaults: xfce_google_chrome_version "" (install latest google-chrome-stable);
pin is opt-in and now safe (auto-falls back to latest when the pin is gone).
- browser.yml: the madison availability guard used POSIX [[:space:]], which
Python re does not support, so it never matched ' | ' separators. Replace with
\s — verified: empty->latest, pinned+available->pin, pinned+gone->latest.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
litellm's pinned fork requires Python <3.14; Ubuntu 26.04 ships 3.14 with no
3.13/3.12 in apt, so the venv pip install fails ('requires a different Python').
When the bootstrap interpreter is >=3.14, install a standalone Python 3.13 via
uv, rebuild the venv with it, and proceed. Debian 13 (3.13) is unaffected.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The uri probe ran 1s after the service (re)start while the adapter still accepts
TCP but doesn't yet answer (read hangs); uri's default 30s timeout + retries/until
did not actually loop on a connection timeout, so it failed after one attempt.
Replace with a curl retry loop (5s per attempt, up to ~30 tries) — the adapter
answers acp.capabilities in ~4ms once ready.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Non-empty pass-through check: xworkmate_bridge_domain feeds /etc/hostname and the
caddy site name; an empty/non-FQDN/127.0.0.1 value yields an invalid Caddyfile.
Assert a valid FQDN when caddy_enabled (public ingress), with a clear remediation
message (set XWORKMATE_BRIDGE_DOMAIN or provide CMDB service_domains).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Same class as bridge/litellm: ansible.builtin.package dispatched to apt inherits
the play's templated module_defaults.apt.lock_timeout un-rendered -> int conversion
error -> on-host bootstrap aborts before litellm/qmd. Use apt on Debian, keep
package for non-Debian (yum/dnf doesn't inherit the apt default).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- xworkmate_bridge_domain falls back to the first CMDB service_domains entry
(inventory hostvar / pipeline-injected env) before ai_workspace_public_domain.
- New task sets the host's /etc/hostname (and running hostname) to that FQDN on
Linux when it's a valid FQDN — never 127.0.0.1/localhost. The caddy site
(xworkmate-bridge-site.caddy.j2) already uses the same var.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The runtime plays set module_defaults.apt.lock_timeout to a templated value.
When a prerequisite task uses ansible.builtin.package (which dispatches to apt
on Debian), that templated default is NOT rendered and the literal
'{{ ai_workspace_apt_lock_timeout | default(900) | int }}' reaches apt ->
'lock_timeout is of type str ... cannot be converted to an int' -> the whole
on-host bootstrap aborts at the xworkmate-bridge prereq, before litellm/qmd
ever deploy (hence they were never up).
Fix: install prereqs via ansible.builtin.apt on Debian/Ubuntu (template renders
like every other apt task); keep ansible.builtin.package for non-Debian Linux
(dispatches to yum/dnf, which doesn't inherit the apt default).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Make the role work identically under both execution models:
- local/pull (curl|bash -> ansible-playbook -c local; localhost == host)
- remote controller (ansible-playbook -i inventory over ssh; tasks run on host)
Changes:
- Remove ALL delegate_to: localhost (the old raw 'command: rsync' detected
local-vs-remote via ansible_connection, but delegate_to localhost forced it
to 'local', so the user@host push branch was dead code -> remote runs wrote
to the controller's /root and failed).
- Acquire xworkspace-core-skills via ansible.builtin.git clone ON THE HOST
(most universal/cross-platform), instead of requiring a controller-side dir.
- Merge core skills into the canonical dir with ansible.builtin.copy
(remote_src, host-local) instead of raw rsync; installer adapters install
directly into the canonical dir on the host.
- Drop rsync-only vars/excludes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Prebuilt runtime ships only dashboard/dist (no package.json) so npm run
preview ENOENT-crash-loops (254). console is a local-only static backend on
127.0.0.1:17000 (dashboard is a routerless SPA); serve it with python3
-m http.server on both Linux (console.service) and macOS (console.plist) —
no second caddy (avoids clashing with the system caddy on :80; console is
local-only and not proxied by default). Gate the apt caddy install on
caddy_enabled (true on public-IP Linux VPS for the bridge ingress; macOS
installs no caddy).
Verified: debian13 + ubuntu26.04 console.service active serving 17000=200;
macOS python3 serves the same dist locally.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reads cmdb.json produced by iac_modules vultr-vps/envs/ai-workspace
generate.py and exposes hosts/groups/hostvars to Ansible, linking IaC
provisioning to playbook deploys (terraform_cmdb.py).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
xworkmate_bridge_obsolete_caddy_fragment_paths references
xworkmate_bridge_caddy_base_dir, but the var was never defined, so the
'Inspect deprecated ACP Caddy fragment' task aborted with
'xworkmate_bridge_caddy_base_dir is undefined'. Define it from the global
caddy_config_dir (consistent with the role's other caddy paths), which is
already OS-aware (/etc/caddy on Linux, Homebrew prefix on macOS).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
acp-gemini.service sets WorkingDirectory={{ acp_gemini_workdir }} (~/.gemini)
but the role never created it, so systemd failed at step CHDIR (status
200/CHDIR), the adapter never bound 127.0.0.1:8791, and the CORS preflight
validation failed after 30 retries. Mirror the opencode role: pre-create the
home, .gemini workdir, XDG config and state dirs owned by the service user.
Linux/Debian only (guarded != Darwin); macOS uses the launchd path.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The online runtime download used releases/latest/download, which GitHub
resolves to whichever release holds the 'Latest' flag. The console repo also
publishes offline-ai-workspace-* build releases that take that flag and carry
no console runtime asset -> HTTP 404 on the online/Debian path. Point at the
stable latest-runtime release (published by the console-runtime workflow) and
add a bounded download retry. The env-provided archive path still wins via the
existing when-guard, so offline/bundled installs are unaffected.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The download used releases/latest/download, which GitHub resolves to the
human-facing v0.1.12 tag (no runtime asset) -> HTTP 404, failing the deploy
on Ubuntu 26.04 (and any platform). Point at the stable runtime-latest
release published by the plugin repo's runtime-release workflow, and add a
bounded retry around the download for transient network errors.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The macOS console API previously ran via `go run .`, which fails under
launchd's minimal PATH (no `go`) and recompiles on every launch. Switch to
the same prebuilt-runtime consumption model the bridge/qmd/litellm runtimes
already use.
The ai-workspace role now does final deployment only (never builds):
- download xworkspace-console-runtime-<os>-<arch>.tar.gz (incl. darwin-arm64)
from the latest-runtime release, or use an offline-staged archive via
XWORKSPACE_CONSOLE_RUNTIME_ARCHIVE;
- unpack to a per-user system dir (~/.local/share/xworkspace-console),
idempotent via a sha256 marker;
- read manifest.json to resolve the prebuilt API binary and assert it is a
present, executable native binary;
- on macOS, deploy a LaunchAgent that sources portal.env and execs the
prebuilt binary directly — no go, no Homebrew, no PATH games.
The Go API is pure-Go (no cgo), so CI cross-compiles darwin-arm64 cleanly;
this role only consumes that artifact. Validated end-to-end on darwin-arm64:
packaged binary serves :8788 (200 with token, 401 without) under launchd.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
LiteLLM crash-looped on macOS with Prisma `P1013: invalid port number in
database URL`. The shared auth token is generated by `openssl rand -base64`
and can contain '/', '+' or '='; injected raw into the DATABASE_URL
userinfo, a '/' truncates the authority so the port parses as invalid and
proxy startup fails (port 4000 never binds).
Percent-encode the password for the DATABASE_URL only, via an explicit
reserved-set replace chain ('%' first to avoid double-encoding) since
Jinja's urlencode leaves '/' unescaped. The DB user password stays raw in
provision-database and LITELLM_DB_PASSWORD, and the URL form decodes back
to the identical secret (verified round-trip), so authentication is
unchanged. No effect when no DB host is configured.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The all-in-one flow reached "Update apt cache" in the
xfce_desktop_minimal_runtime role on macOS and failed with
`[Errno 2] No such file or directory: b'update'` (no apt on Darwin).
XFCE + XRDP is a Linux remote-desktop stack and is meaningless on macOS,
which already has a native GUI. Guard both role includes in
setup-xfce-xrdp.yaml with `ansible_os_family != 'Darwin'` so the apt/systemd
tasks never run there. Linux behavior is unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`qmd status` aborted with ERR_DLOPEN_FAILED — better-sqlite3 was compiled
against NODE_MODULE_VERSION 137 (node@24) but the validate-status task ran
under nvm's Node 20 (NODE_MODULE_VERSION 115), because the user's PATH puts
nvm node ahead of Homebrew and the task pinned no PATH.
Pin `/opt/homebrew/bin` (node@24) ahead of nvm on Darwin for the npm
install, npm build, and validate-status tasks so the native module is
built and loaded against one consistent Node ABI — the same node@24 the
launchd plist already uses. Linux PATH is left unchanged via an
ansible_os_family conditional.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The QMD launchd plist hardcoded an NVM node path
(`~/.nvm/versions/node/{{ nodejs_version }}/bin`), but `nodejs_version` is
never defined in the Homebrew-based macOS deploy, so "Deploy QMD
LaunchAgent" aborted with `AnsibleUndefinedVariable: 'nodejs_version' is
undefined`.
QMD is a bun binary and the Linux user unit already uses
`.bun/bin:.local/bin:...`. Mirror that for the plist PATH and add the
Homebrew prefix (`/opt/homebrew/bin`) for the brew-installed node@24,
removing the nvm/nodejs_version dependency entirely (same remedy as the
console plist in TC-MAC-005).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`prisma generate` invokes the `prisma-client-py` generator as a `/bin/sh`
subprocess, which is resolved via PATH. Even though the role calls the
absolute venv `prisma` binary, the generator console script lives in the
same venv bin dir that is not on the default command PATH, so generation
failed with "prisma-client-py: command not found" on macOS.
Add an `environment.PATH` that prepends the venv bin dir (plus Homebrew
prefixes) so the generator subprocess resolves.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The "Inspect installed LiteLLM dependency versions" probe was written as a
multi-line Python program under YAML `>-` folding, which collapses every
newline into a space. The resulting single logical line contained a
`for ... : try: ... except:` block, which is a SyntaxError. With
`failed_when: false` the failure was swallowed, leaving stdout empty, and
the subsequent `set_fact` crashed in `from_json('')` with
"Expecting value: line 1 column 1 (char 0)".
Rewrite the probe as a genuinely single-line program (dict/list
comprehensions over importlib.metadata.distributions(), joined by `;`),
and harden the decision `set_fact` with `default('{}', true)` so an empty
or malformed probe degrades to "install required" instead of aborting the
play.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
litellm[proxy] pulls large wheels (polars-runtime ~46MB) that break mid-stream over slow/mirrored links with IncompleteRead, failing the deploy. Add pip --retries/--resume-retries (resumes partial downloads) + longer timeout, tunable via litellm_pip_* vars, and upgrade pip in the venv first so --resume-retries (pip>=25.1) exists.