xworkspace-console/docs/en/case/macos_compatibility_tests.md

35 KiB

🇺🇸 English | 🇨🇳 中文

macOS Compatibility Deployment Test Cases

This document records the cross-platform compatibility issues encountered and their fix solutions during the fully automated deployment of setup-ai-workspace-all-in-one.sh in the macOS (Darwin) environment.

Core Background

The original script and Ansible Playbooks were designed for Debian/Ubuntu Linux, strongly relying on root permissions, the apt package manager, system directories (/usr/local/sbin, /etc/systemd), and default user paths (/home/ubuntu). Deploying in unprivileged mode on macOS triggered a massive amount of permission and path exceptions.


TC-MAC-001: TTYD Binary and Path Exceptions

Item Content
Trigger File setup-ai-workspace-all-in-one.sh
Trigger Error The script attempts to download the ttyd binary and write it to /usr/local/bin/ttyd, but lacks permissions and the architecture mismatches
Fix Solution Intercept binary download under Darwin, switch to brew install ttyd; use command -v ttyd to dynamically resolve the path

TC-MAC-002: Global Privilege Escalation (Sudo) Blocking

Item Content
Trigger File setup-ai-workspace-all-in-one.sh → Ansible Playbook
Trigger Error sudo: a password is required
Fix Solution Inject --extra-vars "ansible_become=false" under Darwin to cancel automatic privilege escalation

TC-MAC-003: Default User Group Allocation Failure

Item Content
Trigger File setup-xworkspace-console.yaml
Trigger Error chown cannot find the ubuntu group
Fix Solution Conditional rendering: "{{ 'staff' if ansible_os_family == 'Darwin' else 'ubuntu' }}"

TC-MAC-004: Hardcoded Paths

Item Content
Trigger File setup-xworkspace-console.yaml Header Variables Area
Trigger Error cd /home/ubuntu/xworkspace-console/dashboard: No such file or directory
Fix Solution Refactor xworkspace_console_home to {{ ansible_env.HOME }}, and chain-evaluate all derived directories

TC-MAC-005: Template Engine Rendering Exception (Undefined Variable)

Item Content
Trigger File console.plist.j2
Trigger Error AnsibleUndefinedVariable: 'nodejs_version' is undefined
Fix Solution Remove NVM environment initialization and nodejs_version dependency, directly append /opt/homebrew/bin to PATH

TC-MAC-006: NPM Global Helper Script Installation Refused

Item Content
Trigger File roles/ai_agent_runtime/tasks/nodejs.yml
Trigger Error chown failed: [Errno 1] Operation not permitted: '/usr/local/sbin/...'
Fix Solution Downgrade installation path to ~/.local/bin under macOS, create directory beforehand, turn off become

TC-MAC-007: Playwright Hardcoded Associated Call Failure

Item Content
Trigger File roles/ai_agent_runtime/tasks/nodejs.yml
Trigger Error [Errno 13] Permission denied: '/usr/local/sbin/ai-workspace-manage-npm-global-package'
Fix Solution Uniformly use conditional path statements in all cmd

TC-MAC-008: Apt Browser Installation Crash

Item Content
Trigger File roles/ai_agent_runtime/tasks/browser.yml
Trigger Error [Errno 2] No such file or directory: b'update' (macOS has no apt)
Fix Solution Add when: ansible_os_family != 'Darwin'; supplement macOS Chrome detection path; change environment variable script path to user directory

TC-MAC-009: Playwright Environment Variable Mount Directory Missing

Item Content
Trigger File roles/ai_agent_runtime/tasks/browser.yml
Trigger Error Destination directory ~/.local/state/ai-workspace/env does not exist
Fix Solution Pre-create the env directory; add default(ansible_env.HOME) to the variable for fault tolerance

TC-MAC-010: Agent Skills Role Hardcoded Path and User

Item Content
Trigger File roles/agent_skills/defaults/main.yml, roles/agent_skills/tasks/main.yml
Trigger Error [Errno 45] Operation not supported: b'/home/ubuntu'
Fix Solution Change all defaults to ansible_env.USER/HOME; add Darwin skip to apt rsync installation

TC-MAC-011: Chromium Version Check Path Contains Spaces

Item Content
Trigger File roles/ai_agent_runtime/tasks/verify.yml
Trigger Error No such file or directory: b'/Applications/Google' (Path containing space is split)
Fix Solution Change ansible.builtin.command to use argv list format parameter passing to avoid space truncation

TC-MAC-012: XWorkMate Bridge Base Directory System Path Write Denied

Item Content
Trigger File setup-ai-workspace-all-in-one.shroles/vhosts/xworkmate_bridge (Variable xworkmate_bridge_base_dir)
Trigger Error TASK [roles/vhosts/xworkmate_bridge/ : Ensure xworkmate-bridge base directory exists]There was an issue creating /opt/cloud-neutral as requested: [Errno 13] Permission denied: b'/opt/cloud-neutral'
Root Cause xworkmate_bridge_base_dir is hardcoded to /opt/cloud-neutral/xworkmate-bridge by default. macOS runs with ansible_become=false, has no permission to write to /opt; moreover, /opt is not a standard macOS directory. This base dir is simultaneously referenced by config.yaml and the launchd plist's WorkingDirectory
Directory Strategy Linux maintains /opt/cloud-neutral/xworkmate-bridge; macOS switches to the Apple standard user-level application data directory ~/Library/Application Support/cloud-neutral/xworkmate-bridge
Fix Solution Two-layer: ① The Darwin branch of setup-ai-workspace-all-in-one.sh injects -e xworkmate_bridge_base_dir="$HOME/Library/Application Support/cloud-neutral/xworkmate-bridge" (curl | bash pulls the script from this repo, playbooks come from an independent repo, so -e on the script side is the only effective fix point under this path); ② The role's defaults/main.yml changes the default value to a ternary expression based on ansible_os_family, making the offline/local playbook path also correct
Effectiveness Prerequisite curl | bash pulls the script from GitHub main. The fix must first be pushed to main of ai-workspace-lab/xworkspace-console; otherwise, the remote is still the old script (extra-vars have the highest priority, if -e was executed it would never fall back to /opt, thereby determining the unfixed remote script was executed)

TC-MAC-013: Vault standalone Directory System Path Write Denied

Item Content
Trigger File roles/vhosts/vault/tasks/main.yml, roles/vhosts/vault/vars/main.yml, roles/vhosts/vault/tasks/macos.yml
Trigger Error TASK [roles/vhosts/vault/ : Ensure standalone Vault directories exist][Errno 13] Permission denied: b'/etc/vault.d', b'/opt/vault'
Root Cause The "Ensure standalone Vault directories exist" task creates /etc/vault.d and /opt/vault/data with owner: root, and lacks the ansible_os_family != 'Darwin' guard that other standalone tasks in the vault role have. macOS runs with become=false, has no permission to write to /etc, /opt, and chown of owner: root cannot complete. Unlike bridge (whose directory owner is the service user, fixable by -e), the owner: root in this task is hardcoded, cannot be overridden by extra-vars, and the role logic must be changed
Directory Strategy Linux maintains /etc/vault.d, /opt/vault/data; macOS switches to Apple standard ~/Library/Application Support/vault, ~/Library/Application Support/vault/data; macOS binary path takes /opt/homebrew/bin/vault (brew installation location), eliminating the need for /usr/local/bin symlinks that require sudo
Fix Solution The role is located in an independent playbooks repo, cannot be directly committed from this repo; reuse the script's existing "post-clone patch" mechanism (see patch_playbook_user_systemd), add patch_playbook_vault_macos() to setup-ai-workspace-all-in-one.sh, and only apply to the cloned vault role under Darwin: ① Append ansible_os_family != 'Darwin' guard to the directory creation task; ② Change vault_config_dir/vault_data_dir/vault_binary_path to OS-based ternary expressions; ③ Pre-create user-owned data directories (including launchd log directory ~/.local/state/xworkspace) in macos.yml. This patch is effective for both curl | bash and local execution paths, is idempotent, and does not alter Linux behavior

TC-MAC-014: common Role Linux Baseline (timedatectl, etc.) Fails on macOS

Item Content
Trigger File roles/vhosts/common/tasks/main.yml
Trigger Error `TASK [common : Base
Root Cause The `Base
Fix Solution Evaluated that these baselines are neither applicable nor authorized for execution on local macOS development deployments. Therefore, patch_playbook_common_macos() is added to setup-ai-workspace-all-in-one.sh (also via post-clone patch) to append the ansible_os_family != 'Darwin' guard to the entire `Base
Note The user only explicitly mentioned set timezone, but the subsequent Base tasks would fail continuously for the same reason, so they were guarded together to avoid step-by-step round trips

TC-MAC-015: Vault Admin Initialization Script Lacks Dependencies/PATH on macOS

Item Content
Trigger File roles/vhosts/vault/tasks/main.yml (Bootstrap task), roles/vhosts/vault/files/init_vault_admin.sh, roles/vhosts/vault/tasks/macos.yml
Trigger Error TASK [vault : Bootstrap Vault admin userpass auth] failed (no_log: true hides details). Vault is already up at this point (health check passed), failure occurs during execution of init_vault_admin.sh
Root Cause Script uses require_cmd vault/jq/curl/base64. macOS default does not include jq, and the "Install standalone Vault dependencies" (apt) task that installs jq is skipped by the != 'Darwin' guard → jq is missing; simultaneously, ansible.builtin.script uses a minimal PATH that doesn't include Homebrew's /opt/homebrew/bin, so even brew installed vault/jq might not be found
Fix Solution Extend patch_playbook_vault_macos(): ① Add brew install jq (creates: /opt/homebrew/bin/jq) in macos.yml; ② Append environment: PATH: "/opt/homebrew/bin:/usr/local/bin:{{ ansible_env.PATH }}" to the Bootstrap task, ensuring the script can find brew-installed vault/jq. The script itself already has macOS adaptation (base64 -D detection). Patch is idempotent, valid YAML, Linux unchanged
Note If it still fails, temporarily disable no_log on the task to view the real stderr of init_vault_admin.sh for further troubleshooting

TC-MAC-016: Vault Admin Initialization Non-Idempotent (re-run reports missing entityID)

Item Content
Trigger File roles/vhosts/vault/files/init_vault_admin.sh
Trigger Error Error writing data to identity/mfa/method/totp/admin-generate ... Code: 400 ... * missing entityID, accompanied by A login request was issued that is subject to MFA validation
Root Cause The script attempts to get entity_id by "logging in as that user" (auth/userpass/login/<user>). However, the script then creates a login-MFA enforcement for userpass. The dev mode Vault runs persistently across multiple deployments (launchd daemon), so in the second and subsequent deployments, this login is intercepted by MFA. It returns an MFA pending validation response instead of a complete token, entity_id is empty → admin-generate reports missing entityID. This is a re-run idempotency defect, not specific to macOS (Linux will fall into the same trap on the second run)
Fix Solution Stop relying on logins that will be intercepted by MFA: change to parsing entity_id via userpass identity entity-alias — iterate through identity/entity-alias/id, find the alias where name==user and mount_accessor==userpass accessor, and take its canonical_id; on the first run (no alias), explicitly create entity + entity-alias. Remove the vault token revoke which is subsequently no longer needed. Idempotent, backward compatible (can recognize implicitly created entities from older version logins). Fixed in the real playbooks repository init_vault_admin.sh; clone path synchronized via patch_playbook_vault_macos()
Troubleshooting Method The no_log: true on this task hid the error; temporarily changed no_log: false + register + wrote stdout/stderr to a mounted directory file, read directly to obtain the true error

TC-MAC-017: PostgreSQL Misuses compose Mode on macOS

Item Content
Trigger File roles/vhosts/postgres/tasks/compose.yml, roles/vhosts/postgres/defaults/main.yml
Trigger Error TASK [postgres : Materialize PostgreSQL admin password] failed (no_log: true). assert `postgresql_admin_password
Root Cause postgresql_deploy_mode defaults to compose. compose.yml follows the Docker path (check/install apt version of docker), and postgresql_admin_password is generated by default via lookup('password', '/root/.ai_workspace_postgres_password ...') — macOS has no permission to write to /root, lookup fails → password is empty → assert fails. The role actually has a native+macos.yml (Homebrew postgresql@16) path prepared, but wasn't switched to by default on macOS
Directory/Mode Strategy macOS deployment postgresql_deploy_mode=native (→ macos.yml, brew install); Linux deployment keeps default compose
Fix Solution Inject -e postgresql_deploy_mode=native in the Darwin branch of setup-ai-workspace-all-in-one.sh, and provide the password directly with append_secret_var postgresql_admin_password=$UNIFIED_AUTH_TOKEN (extra-vars have highest priority, completely bypassing the /root password lookup). Linux branch remains unchanged

TC-MAC-018: postgres native Install Misuses Expired Intel Homebrew Crash

Item Content
Trigger File roles/vhosts/postgres/tasks/macos.yml
Trigger Error Ensure PostgreSQL 16 is installed via Homebrew/usr/local/Homebrew/.../macos_version.rb: unknown or unsupported macOS version: "27.0" (MacOSVersion::Error)
Root Cause This task uses the community.general.homebrew module. The module auto-detects the brew prefix and hit the expired Intel Homebrew (/usr/local/Homebrew) on the machine. Its built-in macOS version table does not recognize 27.0, causing brew to crash on startup. Meanwhile, vault/openclaw using command: brew (which uses the available brew on PATH, like Apple Silicon's /opt/homebrew) worked fine—this is the module selecting the wrong brew, not brew being entirely unavailable
Fix Solution Align with vault/openclaw: change to ansible.builtin.command: brew install postgresql@16, and prepend /opt/homebrew/bin:/usr/local/bin to environment.PATH (prioritizing the available brew), add HOMEBREW_NO_AUTO_UPDATE=1; maintain idempotency using register+changed_when/failed_when. Real repository macos.yml is updated; clone path synchronized via patch_playbook_postgres_macos()
Note If the machine only has a single, expired brew (pure Intel), the root cause is the environment, requiring brew update/reinstalling Homebrew; this fix bypasses the issue as long as an "available brew exists" (the vault step proved an available brew exists)

TC-MAC-019: litellm Similarly Misuses Homebrew Module Crash

Item Content
Trigger File roles/vhosts/litellm/tasks/main.yml
Trigger Error Install LiteLLM prerequisites (macOS)/usr/local/Homebrew/.../macos_version.rb: unknown or unsupported macOS version: "27.0"
Root Cause Same root as TC-MAC-018: community.general.homebrew module hits expired Intel Homebrew and crashes
Fix Solution Change to ansible.builtin.command: brew install python@3.13 + prepend /opt/homebrew/bin:/usr/local/bin to environment.PATH + HOMEBREW_NO_AUTO_UPDATE=1. Real repository updated; clone path synchronized via patch_playbook_litellm_macos()
Note litellm still has macOS gaps to be handled individually subsequently: /root derived salt/db secret asserts, /etc/litellm config directory, pip/prisma tasks using become: true + become_user (service user is not created on macOS), DB provisioning, etc.

Current Progress Snapshot (2026-06-22)

The current macOS debugging entry point remains the public installation command:

curl -sfL https://install.svc.plus/ai-workspace | bash -

As of 2026-06-22, the xworkspace-console bootstrap entry, playbooks all-in-one role pipeline, and ai-workspace-services/litellm runtime release pipeline have formed a three-repository coordination. The macOS local deployment has bypassed early path, permission, Homebrew, Vault, PostgreSQL, OpenClaw, QMD and other blocking points. Current remaining risks mainly focus on the network stability of LiteLLM dependency installation, the product verification of the offline runtime release, and the final consecutive idempotent deployments.

Key commits pushed to ai-workspace-infra/playbooks:

Commit Theme Impact on macOS Deployment
09a39e6 perf(openclaw): avoid unnecessary doctor repairs Separates OpenClaw doctor and restart, preventing normal restart from triggering doctor --fix --force
f01e0bb fix(qmd): provision macOS LaunchAgent Supplements user-level LaunchAgent for QMD, supporting starting MCP service on macOS
c11f51b fix(openclaw): allow version-matched acpx plugin Supports version-matched acpx plugin, avoiding accidental kill by plugin registry assert
71ebe64 fix(litellm): isolate runtime in Python 3.13 venv LiteLLM changed to Python 3.13 venv isolation, avoiding mixing Python 3.13/3.14
6a2f05f fix(litellm): skip redundant dependency installs Adds package detection and install markers, skipping installed LiteLLM dependencies during repeated execution

Key commits pushed to ai-workspace-services/litellm:

Commit Theme Impact on macOS/Offline Deployment
51cde5e32 ci: add offline litellm runtime workflow Adds .github/workflows/offline-package-litellm-runtime.yaml, outputting litellm-runtime-<distro>-<version>-<arch>.tar.gz for console offline package scripts

Still need to use a clean install to verify if the remote script pointed to by install.svc.plus already contains the latest bootstrap logic. If the failure point still shows old tasks or old paths, first confirm whether the release entry point has been synchronized to the latest version of ai-workspace-lab/xworkspace-console@main.

TC-MAC-020: OpenClaw doctor is Too Heavy Causing Slow Handler

Item Content
Trigger File roles/vhosts/gateway_openclaw/handlers/main.yml
Trigger Phenomenon RUNNING HANDLER [roles/vhosts/gateway_openclaw/ : Repair OpenClaw health findings (POSIX)] takes about 5-6 seconds; previously restart and doctor were bound, normal config changes could also trigger openclaw doctor --fix --force --yes
Root Cause The handler coupled "lightweight restart" and "doctor repair", and --fix --force runs the repair path by default, suitable for real health issues, not suitable to run on every deployment conclusion
Fix Solution Doctor and restart have been split in playbooks: normally only do lightweight restart; trigger doctor only on actual package/config/plugin changes; prioritize lighter check/repair mode, reducing unrelated changes pulling up doctor
Verification Status Committed 09a39e6. Still need to observe if the OpenClaw handler is only triggered on real changes in a complete macOS deployment

TC-MAC-021: QMD Lacks macOS LaunchAgent

Item Content
Trigger File roles/vhosts/qmd/
Trigger Phenomenon QMD MCP port http://localhost:8181/mcp needs to run as a macOS user service, but the role lacks launchd provisioning
Root Cause The Linux/systemd path already has service management, macOS lacks user-level service descriptions like LaunchAgents/plus.svc.xworkspace.qmd.plist
Fix Solution Add QMD LaunchAgent: plus.svc.xworkspace.qmd, starting it as a macOS user-level service
Verification Status Committed f01e0bb. Still need to verify launchctl status and http://localhost:8181/mcp reachability after complete install

TC-MAC-022: OpenClaw Codex Plugin Compatibility Assert Accidental Kill

Item Content
Trigger File roles/vhosts/gateway_openclaw/tasks/main.yml
Trigger Error Assert OpenClaw Codex plugin matches gateway version fails, prompting that it must run @openclaw/codex 2026.6.1 and openclaw-multi-session-plugins 2026.6.1, and must not retain stale global @openclaw/acpx
Root Cause The assert treats all acpx as stale, but the current OpenClaw plugin registry might contain version-matched acpx. It should check the version instead of just existence
Fix Solution Adjust assert: allow version-matched acpx, only reject stale/global mismatched versions
Verification Status Committed c11f51b. Still need to observe assert results after plugin registry refresh in a full deployment

TC-MAC-023: LiteLLM Python 3.13/3.14 Mixed Use

Item Content
Trigger File roles/vhosts/litellm/defaults/main.yml, roles/vhosts/litellm/tasks/main.yml
Trigger Phenomenon On macOS, Homebrew Python mixes with system/other Python versions. LiteLLM dependencies might be installed into inconsistent interpreters or site-packages, causing instability in prisma generate and service startup
Root Cause Early installation paths didn't enforce independent venvs, and the macOS environment might simultaneously possess Python 3.13 and 3.14
Fix Solution LiteLLM runtime fixed to use Python 3.13 to create an isolated venv: ~/.local/share/litellm/venv; pip, litellm, and prisma are all executed from this venv
Verification Status Committed 71ebe64. Still need a full deployment to verify service startup and prisma generate

TC-MAC-024: LiteLLM Dependency Install is Slow and Public Network Download Easily Interrupts

Item Content
Trigger File roles/vhosts/litellm/tasks/main.yml, roles/vhosts/litellm/defaults/main.yml
Trigger Error Ensure LiteLLM and DB dependencies are installed took up to ~581 seconds, then failed due to download interruptions from GitHub archive or PyPI wheel like IncompleteRead / curl 18 / EOF
Root Cause The litellm[proxy] dependency tree is large, containing huge packages like polars-runtime-32, cryptography, boto3, mcp. Direct online pip install is both slow and relies on network stability. Changing git+https to GitHub archive solved git clone EOF, but large wheel download interruptions are still unavoidable
Fixed ① Default install source changed from git+https to GitHub archive; ② Added PIP_CACHE_DIR and longer timeout; ③ Probe for installed litellm/prisma/psycopg2-binary before install, and skip duplicate installs using .install-spec marker; ④ Added offline runtime workflow to ai-workspace-services/litellm to pre-build target distribution wheelhouses
Current Status Online install path has mitigated but not eradicated network risks; the true long-term solution is to have all-in-one prioritize consuming the wheelhouse within litellm-runtime-<distro>-<version>-<arch>.tar.gz
To Verify Need to trigger and confirm offline-package-litellm-runtime.yaml generates a release in GitHub Actions, and xworkspace-console/scripts/create-ai-workspace-offline-package.sh can pull the matching runtime asset from ai-workspace-services/litellm

TC-MAC-025: LiteLLM runtime release Connection with all-in-one Offline Package

Item Content
Trigger File ai-workspace-services/litellm/.github/workflows/offline-package-litellm-runtime.yaml, xworkspace-console/scripts/create-ai-workspace-offline-package.sh, xworkspace-console/scripts/ai-workspace-offline-install.sh
Contract The console offline package script will download litellm-runtime-${DISTRO_ID}-${DISTRO_VERSION}-${ARCH}.tar.gz under LITELLM_RUNTIME_RELEASE_REPO=ai-workspace-services/litellm, extract it, and copy packages/pip, optionally packages/python, and metadata/runtime.env
Completed Added workflow to litellm repo, with matrix covering Debian 11/12/13 and Ubuntu 22.04/24.04/26.04 on amd64/arm64; Ubuntu 26.04 additionally packages portable Python 3.13.14; SHA256SUMS merged in release
To Do Need to verify if GitHub Actions actual runs succeed; need to confirm release tag naming matches console side latest-runtime resolution; need to practically test if metadata/litellm-runtime.env correctly injects LITELLM_PACKAGE_SPEC in the offline all-in-one package

TC-MAC-026: uninstall purge Needs to Print Deleted Paths

Item Content
Trigger Command curl -sfL https://install.svc.plus/ai-workspace | bash -s -- uninstall purge
Requirement The purge mode not only deletes local state but should also explicitly print paths to be/already deleted, facilitating user confirmation of the cleanup scope
Current Status Identified as a to-do item; need to extract a unified purge_path / purge_matching_paths helper in the uninstall/purge branch of setup-ai-workspace-all-in-one.sh, outputting existing paths before deletion, and also outputting skipped/absent when not existing
Involved Paths macOS includes at least ~/.ai_workspace_auth_token, ~/.vault_password, ~/.openclaw, /tmp/xworkspace-core-skills, /tmp/xworkmate-bridge, /tmp/ai-workspace-deploy; Linux additionally includes /opt/ai-workspace, /etc/ai-workspace, user systemd units, etc.

TC-MAC-027: Non-Source Code Formal Directory Cleanup

Item Content
Trigger Phenomenon Generated directories like ai-workspace-all-in-one-offline-ubuntu-22.04-amd64/ appear in the workspace
Root Cause Offline package build/extraction products entered the development workspace, easily mistaken for source code directories
Handling Principle Generated products that don't belong to the formal directories of the source repository should be cleaned up from the workspace; offline package output should be placed in explicit dist/, release artifact, or temp directories, and shouldn't mix into the source root
To Do Need to append a repo-level sweep subsequently: confirm git status --ignored for xworkspace-console, playbooks, and litellm respectively, clean untracked offline package directories, and append .gitignore as needed

TC-MAC-028: LiteLLM Dependency Version Detection One-Line Python Syntax Error Causes set_fact Crash

Item Content
Trigger File roles/vhosts/litellm/tasks/main.yml (Inspect/Decide task)
Trigger Error TASK [litellm : Decide whether LiteLLM dependencies need installation]the field 'args' ... could not be converted to dict.. Expecting value: line 1 column 1 (char 0)
Root Cause The "Inspect installed LiteLLM dependency versions" detection script was written as multi-line Python, but under YAML >- folding, all newlines were compressed into spaces, turning for package in packages: try: ... except: into an illegal single line → SyntaxError. failed_when: false swallowed the failure causing empty stdout, and subsequently set_fact's from_json('') crashed. default('{}') does not replace empty strings (only undefined)
Fix Solution Changed detection to a true one-liner program (using dict/list comprehensions for importlib.metadata.distributions(), connected by semicolons); decision set_fact uses default('{}', true), meaning empty/illegal output degrades to "installation needed" instead of aborting the playbook. Commit ce2070e

TC-MAC-029: prisma generate Cannot Find prisma-client-py Generator

Item Content
Trigger File roles/vhosts/litellm/tasks/main.yml (Generate Prisma Python Client)
Trigger Error Error: Generator "prisma-client-py" failed: /bin/sh: prisma-client-py: command not found
Root Cause prisma generate invokes the prisma-client-py generator as a /bin/sh subprocess, its console script is installed in the venv's bin directory. However, the task called prisma with an absolute path but didn't put the venv bin into PATH, so the default command PATH couldn't resolve the generator
Fix Solution Added environment.PATH to this task, prepending {{ litellm_venv_dir }}/bin (then Homebrew prefix), making the generator subprocess resolvable. Commit bbf5260

TC-MAC-030: QMD LaunchAgent References Undefined nodejs_version

Item Content
Trigger File roles/vhosts/qmd/templates/qmd.plist.j2
Trigger Error TASK [qmd : Deploy QMD LaunchAgent]AnsibleUndefinedVariable: 'nodejs_version' is undefined
Root Cause The plist's PATH hardcoded ~/.nvm/versions/node/{{ nodejs_version }}/bin, but under Homebrew deployment, nodejs_version was never defined (same anti-pattern as TC-MAC-005)
Fix Solution QMD is a bun binary, and the Linux user unit already uses .bun/bin:.local/bin:...; plist PATH aligned to {{ qmd_home }}/.bun/bin:{{ qmd_home }}/.local/bin:/opt/homebrew/bin:..., removing nvm/nodejs_version dependencies. Commit d903396

TC-MAC-031: QMD better-sqlite3 Native Module Node ABI Mismatch

Item Content
Trigger File roles/vhosts/qmd/tasks/main.yml (npm install / npm run build / Validate QMD status)
Trigger Error TASK [qmd : Validate QMD status]Error: ... better_sqlite3.node was compiled against a different Node.js version using NODE_MODULE_VERSION 137. This version of Node.js requires NODE_MODULE_VERSION 115 (ERR_DLOPEN_FAILED)
Root Cause better-sqlite3 was compiled with node@24 (ABI 137), but the validate-status task didn't fix PATH. The user PATH's nvm Node 20 (ABI 115) ranked before Homebrew, causing inconsistent Node ABI between runtime and build
Fix Solution The three tasks (npm install / npm run build / validate-status) under Darwin use {{ '/opt/homebrew/bin:/usr/local/bin:' if ansible_os_family == 'Darwin' else '' }}{{ ansible_env.PATH }} to fix node@24, ensuring build and runtime ABI consistency (consistent with plist); Linux PATH unchanged. Commit 6091b9d

TC-MAC-032: XFCE/XRDP Linux Desktop Stack Fails to run apt on macOS

Item Content
Trigger File setup-xfce-xrdp.yamlroles/vhosts/xfce_desktop_minimal_runtime
Trigger Error TASK [xfce_desktop_minimal_runtime : Update apt cache][Errno 2] No such file or directory: b'update' (macOS lacks apt)
Root Cause XFCE + XRDP is a Linux remote desktop stack (apt/systemd), which is meaningless on macOS that already has a native GUI, but all-in-one still ran this play down to Darwin
Fix Solution Both include_roles in setup-xfce-xrdp.yaml gained when: ansible_os_family != 'Darwin', skipping the entire stack on macOS; Linux unchanged. Commit ef67c61

TC-MAC-033: LiteLLM DATABASE_URL Password Not Percent-Encoded Causes Prisma P1013

Item Content
Trigger File roles/vhosts/litellm/defaults/main.yml (litellm_database_url)
Trigger Phenomenon Deployment "succeeds" (ansible failed=0) but service summary shows LiteLLM : inactive (not detected;http:000), launchd exit code non-0, port 4000 not listening; litellm.err.log repeats Error: P1013: The provided database string is invalid. invalid port number in database URL
Root Cause Unified auth token generated via openssl rand -base64 may contain /, +, =; when directly concatenated into userinfo of postgresql://litellm:<token>@host:port/db, / truncates the URL authority, failing port parsing, proxy fails to start, 4000 not listening. Health check failed_when: false masked it, ansible still reported success
Fix Solution Percent-encode only the password in DATABASE_URL (added litellm_database_password_urlencoded, explicit replace chain prioritizing %; Jinja urlencode doesn't escape / so it's unusable). The actual DB user password in provision-database and LITELLM_DB_PASSWORD keeps original text, URL format decoded matches original (round-trip verified), auth remains unchanged. Commit 9926a46
Verification Method ansible failed=0 ≠ service available: need independent confirmation via launchctl list (Status 0), lsof -iTCP:4000 -sTCP:LISTEN, curl /health (401 means healthy, auth-gated)

Fix Dimension Summary

Dimension Involved Cases
Component acquisition method replacement (brew vs binary) TC-001
Privilege reduction (become: false) TC-002, TC-006, TC-007, TC-008, TC-009
User group adaptation (staff vs ubuntu) TC-003, TC-010
Directory path downgrade ($HOME vs /home/ubuntu, /opt, /etc) TC-004, TC-006, TC-009, TC-010, TC-012, TC-013
Post-clone patch injection TC-013, TC-014
Linux baseline total skip (skip Linux baseline on Darwin) TC-014, TC-032
brew dep supplement + PATH injection (jq via brew, Homebrew on PATH) TC-015
Package manager bypass (skip apt on Darwin) TC-008, TC-010, TC-032
Template variable decoupling (remove nvm/nodejs_version) TC-005, TC-030
Path space compatibility (argv vs string) TC-011
Homebrew module bypass (command brew + PATH) TC-018, TC-019
venv/Node subprocess PATH injection (resolve generator/native ABI) TC-029, TC-031
Node ABI consistency (build == runtime node@24) TC-031
macOS launchd user service TC-021
handler trigger condition convergence TC-020
Python venv isolation and pip cache TC-023, TC-024
One-line template/folding syntax robustness (>- folding, default(.,true)) TC-028
Connection string password percent encoding (URL-encode secrets) TC-033
Offline runtime wheelhouse TC-025
purge observability TC-026