Compare commits

...

372 Commits

Author SHA1 Message Date
55a05da3bf
feat: add XWorkmate install redirect (#23)
Co-authored-by: Haitao Pan <manbuzhe2009@qq.com>
2026-06-29 15:47:04 +08:00
477b52c516
fix(acp_server_opencode): detect opencode CLI at deploy time (portable across Debian/Ubuntu/macOS) (#22)
Stop assuming a fixed opencode path. Probe the real binary with 'command -v'
using the role PATH, then feed the resolved path to both the systemd unit and
the launchd plist (plist now also passes -opencode-bin). Falls back to the
OS-aware default when opencode is not yet installed.

Also remove the dead acp-bridge.service.j2 template: it was not deployed by any
task and referenced two undefined vars (acp_opencode_bridge_disabled_binary_path,
acp_opencode_bridge_opencode_binary_path) — a hardcoding landmine.

Co-authored-by: Haitao Pan <manbuzhe2009@qq.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 15:31:54 +08:00
4364786465
fix(acp_server_opencode): service PATH + bin var + surface adapter crash in validate (#21)
ACP readiness probe returned 000 for the full retry window on
xworkmate-bridge-ubuntu-26 (nothing listening = adapter crash-loop), but the
play aborted at the probe so the real cause never reached the CI log.

- systemd unit: add Environment=PATH ({{ acp_opencode_path }}, parity with the
  launchd plist) so the lazily-spawned opencode/node CLI resolves; replace the
  hardcoded --opencode-bin /usr/bin/opencode with {{ acp_opencode_binary_path }}
  ({{ npm_global_bin }}/opencode), matching the gemini/codex roles and macOS.
- validate.yml: wrap the readiness probe in block/rescue that dumps systemctl
  status + journalctl on failure, so the adapter crash reason is visible.
- fix latent undefined var in the summary (acp_opencode_adapter_http ->
  acp_opencode_adapter_probe), which would have errored once the endpoint came up.

Co-authored-by: Haitao Pan <manbuzhe2009@qq.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 15:25:32 +08:00
e953d87f07
ci: add release/* branch source validation workflow (#19)
release/* 仅接受 hotfix/* 或带 cherry-pick/backport 标签的 PR。
详见 iac_modules/docs/tldr-github-branch-model.md

Co-authored-by: Haitao Pan <manbuzhe2009@qq.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 12:12:33 +08:00
Haitao Pan
d806ba9d3d fix: update litellm mainstream models registration and gateway defaults 2026-06-27 14:49:18 +08:00
a2ce5b9d05 fix(cloudflare): prefer DNS scoped token 2026-06-27 13:48:19 +08:00
19a3c9f72a fix(macos): select architecture Homebrew explicitly 2026-06-27 12:45:34 +08:00
Haitao Pan
5c74feb860 fix(cloudflare_dns): prefer CLOUDFLARE_API_TOKEN over CLOUDFLARE_DNS_API_TOKEN
Align the DNS role's token resolution with the rest of the stack, which
exports the generic CLOUDFLARE_API_TOKEN. The dedicated *_DNS_API_TOKEN now
acts as the fallback, both for play vars and the environment lookup.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 11:31:08 +08:00
Haitao Pan
9b59c89d80 fix(console): expose Homebrew Go to macOS API service 2026-06-27 09:18:03 +08:00
Haitao Pan
abee312617 fix(xfce/nodejs): explicit nodejs_version fallback (omit sentinel leaked into repo URL)
Previous default(omit) was wrong: in include_role vars, omit does not fall back
to the role default — it injects the omit placeholder, which rendered as
node_<<Omit>>.x in the NodeSource apt repo URL and failed apt update. Use an
explicit fallback to the nodejs role's documented default (22.22.3). Avoids both
the 2.19 self-reference recursion and the omit-sentinel leak.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 10:56:36 +08:00
Haitao Pan
cd9a783de7 fix(xworkmate_bridge): align Caddy SSE timeouts with bridge 60min max wait
Caddy /acp* used read/write_timeout 30m while the bridge max gateway wait is
60min, so long tasks had their SSE killed at the edge (ACP_HTTP_CONNECTION_CLOSED)
while OpenClaw kept running. /api*, /artifacts/* and / also lacked flush_interval
and long timeouts, making polling/streaming fragile.

- T1: introduce xworkmate_bridge_acp_stream_timeout (70m = 60min cap + grace),
  acp_dial_timeout, acp_upstream_keepalive; drive /acp* read/write_timeout from it.
- T2: apply flush_interval -1 + the same long timeouts to /api*, /artifacts/*, /.
- Update validate.yml assertions to reference the vars instead of hardcoded 30m.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 10:49:01 +08:00
Haitao Pan
8fcff61855 fix(ai_agent_runtime): resolver must verify browser actually runs, skip disabled stub
The Chromium resolver accepted any candidate that merely existed (command -v /
-x), so it selected xfce's intentionally-disabled /usr/local/bin/chromium stub
(exits 126 "Chromium is disabled, use google-chrome") over the working
google-chrome. The later "Check chromium version" verify then failed rc=126.
Latent on fresh hosts (depends on role ordering vs the stub install) and
deterministic on any re-run. Now require `<candidate> --version` to succeed
before accepting, so the stub is skipped and google-chrome is resolved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 10:42:06 +08:00
Haitao Pan
5d00d700ca fix(xfce/nodejs): drop self-referential nodejs_version (Ansible 2.19 recursion)
include_role passed `nodejs_version: "{{ ai_agent_runtime_nodejs_version |
default(nodejs_version) }}"` — a var named nodejs_version whose template
references nodejs_version itself. Ansible 2.19+'s lazy templating detects the
self-reference in the AST and fails the nodejs role's `nodejs_version_major`
set_fact with "Recursive loop detected: maximum recursion depth exceeded".
Use default(omit) so the nodejs role's own default applies when the
ai_agent_runtime override is absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-26 10:34:52 +08:00
Haitao Pan
50dba213ee feat: implement postgresql.svc.plus docker deployment role 2026-06-26 10:00:00 +08:00
Haitao Pan
c62386f30c fix(postgres): own PGDATA by container uid so re-runs don't break access
On re-run, "Ensure compose directories exist" reset the bind-mounted data dir
to root:root 0700. The official postgres image only chowns/initdb's an EMPTY
PGDATA, so a non-empty data dir stayed root-owned while the backend runs as uid
999 -> "could not open file global/pg_filenode.map: Permission denied" (pg_isready
still passes, masking it; ALTER USER / real queries fail).

Split the dir task: compose project dir stays root:root; data dir is created
owned by postgresql_container_uid/gid (default 999), idempotent across re-runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 22:42:11 +08:00
Haitao Pan
29e60383e3 fix(xfce_browser): allow_downgrade on Chrome install to avoid downgrade hard-fail
When a host's Chrome apt repo already carries a newer build than a pinned version,
apt refuses with "Packages were downgraded and -y was used without
--allow-downgrades". Set allow_downgrade: true so an explicit (older-but-available)
pin installs cleanly. Complements the empty-default fix (e174e8b): default path
installs latest, pinned path now also robust.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 22:34:35 +08:00
Haitao Pan
e174e8bcfa fix(xfce_browser): stop pinning Chrome build + fix broken availability regex
Deploy failed on ubuntu26 with "no available installation candidate for
google-chrome-stable=149.0.7827.114-1": Google's apt repo only ever carries the
current stable, so any pinned build vanishes within weeks.

Two fixes:
- defaults: xfce_google_chrome_version "" (install latest google-chrome-stable);
  pin is opt-in and now safe (auto-falls back to latest when the pin is gone).
- browser.yml: the madison availability guard used POSIX [[:space:]], which
  Python re does not support, so it never matched ' | ' separators. Replace with
  \s — verified: empty->latest, pinned+available->pin, pinned+gone->latest.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-25 22:14:46 +08:00
Haitao Pan
5aadb4f0dc fix(xfce): fall back when pinned chrome apt version is unavailable 2026-06-25 20:32:47 +08:00
Haitao Pan
c9919284e0 fix(bridge): avoid embedded templates in caddy assertion 2026-06-25 20:26:38 +08:00
Haitao Pan
5984a75643 fix(litellm): provision Python 3.13 via uv when system python >=3.14
litellm's pinned fork requires Python <3.14; Ubuntu 26.04 ships 3.14 with no
3.13/3.12 in apt, so the venv pip install fails ('requires a different Python').
When the bootstrap interpreter is >=3.14, install a standalone Python 3.13 via
uv, rebuild the venv with it, and proceed. Debian 13 (3.13) is unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 21:23:27 +08:00
Haitao Pan
c7bc68a6dc fix(acp_server_opencode): robust curl-retry for ACP endpoint readiness
The uri probe ran 1s after the service (re)start while the adapter still accepts
TCP but doesn't yet answer (read hangs); uri's default 30s timeout + retries/until
did not actually loop on a connection timeout, so it failed after one attempt.
Replace with a curl retry loop (5s per attempt, up to ~30 tries) — the adapter
answers acp.capabilities in ~4ms once ready.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 21:21:37 +08:00
Haitao Pan
609a88ddcf feat(bridge): fail fast when bridge domain is empty/non-FQDN under Caddy exposure
Non-empty pass-through check: xworkmate_bridge_domain feeds /etc/hostname and the
caddy site name; an empty/non-FQDN/127.0.0.1 value yields an invalid Caddyfile.
Assert a valid FQDN when caddy_enabled (public ingress), with a clear remediation
message (set XWORKMATE_BRIDGE_DOMAIN or provide CMDB service_domains).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 20:50:19 +08:00
Haitao Pan
40b7975061 fix(common): install fail2ban via apt on Debian so module_defaults lock_timeout renders
Same class as bridge/litellm: ansible.builtin.package dispatched to apt inherits
the play's templated module_defaults.apt.lock_timeout un-rendered -> int conversion
error -> on-host bootstrap aborts before litellm/qmd. Use apt on Debian, keep
package for non-Debian (yum/dnf doesn't inherit the apt default).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 16:05:40 +08:00
Haitao Pan
3709074916 feat(bridge): set host FQDN + caddy site from XWORKMATE_BRIDGE_DOMAIN or CMDB service_domains
- xworkmate_bridge_domain falls back to the first CMDB service_domains entry
  (inventory hostvar / pipeline-injected env) before ai_workspace_public_domain.
- New task sets the host's /etc/hostname (and running hostname) to that FQDN on
  Linux when it's a valid FQDN — never 127.0.0.1/localhost. The caddy site
  (xworkmate-bridge-site.caddy.j2) already uses the same var.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 15:56:30 +08:00
Haitao Pan
c3a0e40566 fix(bridge,litellm): use apt on Debian so module_defaults lock_timeout renders
The runtime plays set module_defaults.apt.lock_timeout to a templated value.
When a prerequisite task uses ansible.builtin.package (which dispatches to apt
on Debian), that templated default is NOT rendered and the literal
'{{ ai_workspace_apt_lock_timeout | default(900) | int }}' reaches apt ->
'lock_timeout is of type str ... cannot be converted to an int' -> the whole
on-host bootstrap aborts at the xworkmate-bridge prereq, before litellm/qmd
ever deploy (hence they were never up).

Fix: install prereqs via ansible.builtin.apt on Debian/Ubuntu (template renders
like every other apt task); keep ansible.builtin.package for non-Debian Linux
(dispatches to yum/dnf, which doesn't inherit the apt default).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 15:54:09 +08:00
Haitao Pan
c3f3b8ac8e refactor(agent_skills): run on target host, git-clone sources, drop delegate_to localhost
Make the role work identically under both execution models:
- local/pull (curl|bash -> ansible-playbook -c local; localhost == host)
- remote controller (ansible-playbook -i inventory over ssh; tasks run on host)

Changes:
- Remove ALL delegate_to: localhost (the old raw 'command: rsync' detected
  local-vs-remote via ansible_connection, but delegate_to localhost forced it
  to 'local', so the user@host push branch was dead code -> remote runs wrote
  to the controller's /root and failed).
- Acquire xworkspace-core-skills via ansible.builtin.git clone ON THE HOST
  (most universal/cross-platform), instead of requiring a controller-side dir.
- Merge core skills into the canonical dir with ansible.builtin.copy
  (remote_src, host-local) instead of raw rsync; installer adapters install
  directly into the canonical dir on the host.
- Drop rsync-only vars/excludes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 14:57:49 +08:00
Haitao Pan
2ef144d572 fix(console): serve dashboard/dist via local python http.server (not npm/caddy)
Prebuilt runtime ships only dashboard/dist (no package.json) so npm run
preview ENOENT-crash-loops (254). console is a local-only static backend on
127.0.0.1:17000 (dashboard is a routerless SPA); serve it with python3
-m http.server on both Linux (console.service) and macOS (console.plist) —
no second caddy (avoids clashing with the system caddy on :80; console is
local-only and not proxied by default). Gate the apt caddy install on
caddy_enabled (true on public-IP Linux VPS for the bridge ingress; macOS
installs no caddy).

Verified: debian13 + ubuntu26.04 console.service active serving 17000=200;
macOS python3 serves the same dist locally.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 09:44:01 +08:00
Haitao Pan
3505ff1c31 fix(ai-workspace): deploy robustness on Debian13/Ubuntu26.04 (py3.13)
- setup-xworkspace-console.yaml:
  - xworkspace_console_user follows ansible_env.USER (was hardcoded ubuntu;
    mismatched home=/root on root connections -> systemd link 'src does not exist')
  - runtime apt task async/poll (xfce4 desktop install dropped the SSH session)
  - api_dir -> bin/ to match prebuilt runtime manifest (apiBinary: bin/xworkspace-api;
    was api/ -> 203/EXEC crash loop)
- roles/ai_agent_runtime/tasks/{main,docs,fonts,browser}.yml: apt lock_timeout
  (texlive/pandoc raced cloud-init/unattended-upgrades for the dpkg lock)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 03:02:43 +08:00
Haitao Pan
a5e19eff60 chore: qmd version bump, macOS container runtime deps, ignore inventory pycache
- roles/vhosts/common: add docker/docker-compose/colima to macOS brew deps
  (headless container runtime for qmd PG memory-bridge tests)
- roles/vhosts/qmd: bump qmd_version
- .gitignore: ignore inventory/__pycache__/

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 21:01:57 +08:00
Haitao Pan
df48cb4f5a feat(inventory): add Terraform CMDB dynamic inventory for ai-workspace
Reads cmdb.json produced by iac_modules vultr-vps/envs/ai-workspace
generate.py and exposes hosts/groups/hostvars to Ansible, linking IaC
provisioning to playbook deploys (terraform_cmdb.py).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 20:57:58 +08:00
Haitao Pan
099a144a9e fix(xworkmate-bridge): define missing xworkmate_bridge_caddy_base_dir
xworkmate_bridge_obsolete_caddy_fragment_paths references
xworkmate_bridge_caddy_base_dir, but the var was never defined, so the
'Inspect deprecated ACP Caddy fragment' task aborted with
'xworkmate_bridge_caddy_base_dir is undefined'. Define it from the global
caddy_config_dir (consistent with the role's other caddy paths), which is
already OS-aware (/etc/caddy on Linux, Homebrew prefix on macOS).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 14:42:20 +08:00
Haitao Pan
f5a5979439 fix(acp-gemini): create runtime dirs so service WorkingDirectory exists
acp-gemini.service sets WorkingDirectory={{ acp_gemini_workdir }} (~/.gemini)
but the role never created it, so systemd failed at step CHDIR (status
200/CHDIR), the adapter never bound 127.0.0.1:8791, and the CORS preflight
validation failed after 30 retries. Mirror the opencode role: pre-create the
home, .gemini workdir, XDG config and state dirs owned by the service user.
Linux/Debian only (guarded != Darwin); macOS uses the launchd path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 14:31:38 +08:00
Haitao Pan
e5fc29fa8a fix(console): download runtime from deterministic latest-runtime tag
The online runtime download used releases/latest/download, which GitHub
resolves to whichever release holds the 'Latest' flag. The console repo also
publishes offline-ai-workspace-* build releases that take that flag and carry
no console runtime asset -> HTTP 404 on the online/Debian path. Point at the
stable latest-runtime release (published by the console-runtime workflow) and
add a bounded download retry. The env-provided archive path still wins via the
existing when-guard, so offline/bundled installs are unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 14:19:03 +08:00
Haitao Pan
9e81f65a62 fix(openclaw): pull multi-session plugin runtime from deterministic runtime-latest asset
The download used releases/latest/download, which GitHub resolves to the
human-facing v0.1.12 tag (no runtime asset) -> HTTP 404, failing the deploy
on Ubuntu 26.04 (and any platform). Point at the stable runtime-latest
release published by the plugin repo's runtime-release workflow, and add a
bounded retry around the download for transient network errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 14:03:23 +08:00
e0bfc765bf feat(litellm): make model registration idempotent via fallback to /model/update 2026-06-23 13:43:42 +08:00
4e183d2d44 fix(litellm): resolve os.environ variables locally before registering models to DB 2026-06-23 13:27:09 +08:00
28df3b59d6 feat(openclaw): conditionally render default UI models and providers based on active API keys 2026-06-23 13:09:56 +08:00
a0d59c0af1 feat(openclaw): adopt native provider simulation pointing to litellm gateway 2026-06-23 12:42:04 +08:00
25b8204b7b fix(openclaw): use hyphens for litellm models to prevent provider intercept 2026-06-23 12:21:45 +08:00
6e260a3425 feat(litellm): ensure deepseek-chat and deepseek-reasoner are registered 2026-06-23 12:18:24 +08:00
e7c96675ff feat(litellm): update model registrations and gateway configurations with API key gating 2026-06-23 11:04:21 +08:00
Haitao Pan
01f1499a60 feat(ai-workspace): consume prebuilt console runtime for final deployment
The macOS console API previously ran via `go run .`, which fails under
launchd's minimal PATH (no `go`) and recompiles on every launch. Switch to
the same prebuilt-runtime consumption model the bridge/qmd/litellm runtimes
already use.

The ai-workspace role now does final deployment only (never builds):
- download xworkspace-console-runtime-<os>-<arch>.tar.gz (incl. darwin-arm64)
  from the latest-runtime release, or use an offline-staged archive via
  XWORKSPACE_CONSOLE_RUNTIME_ARCHIVE;
- unpack to a per-user system dir (~/.local/share/xworkspace-console),
  idempotent via a sha256 marker;
- read manifest.json to resolve the prebuilt API binary and assert it is a
  present, executable native binary;
- on macOS, deploy a LaunchAgent that sources portal.env and execs the
  prebuilt binary directly — no go, no Homebrew, no PATH games.

The Go API is pure-Go (no cgo), so CI cross-compiles darwin-arm64 cleanly;
this role only consumes that artifact. Validated end-to-end on darwin-arm64:
packaged binary serves :8788 (200 with token, 401 without) under launchd.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 17:04:55 +08:00
Haitao Pan
a5850cfcee fix(acp_server_gemini): revert incompatible adapter command syntax and update args for antigravity-cli 2026-06-22 13:59:52 +08:00
Haitao Pan
2a85be5c9b fix(xworkmate_bridge): remove obsolete IMAGE variable causing undefined errors 2026-06-22 13:55:14 +08:00
Haitao Pan
32e00a8617 fix(litellm,validation): refine model registration and add cross-platform service validation 2026-06-22 13:52:05 +08:00
Haitao Pan
0ac424f00e Merge branch 'xworkspace-portal-dashboard-17000'
# Conflicts:
#	setup-xworkspace-console.yaml
2026-06-22 13:27:37 +08:00
Haitao Pan
1b2aea005a Merge branch 'refactor/upgrade-antigravity-cli'
# Conflicts:
#	roles/vhosts/acp_server_gemini/defaults/main.yml
#	roles/vhosts/acp_server_gemini/templates/gemini.plist.j2
2026-06-22 13:26:30 +08:00
Haitao Pan
93a3067ea4 Merge branch 'codex/openclaw-playbook-concurrency'
# Conflicts:
#	roles/vhosts/gateway_openclaw/templates/openclaw.json.j2
#	roles/vhosts/xworkmate_bridge/defaults/main.yml
2026-06-22 13:25:45 +08:00
Haitao Pan
9926a46f76 fix(litellm): percent-encode DB password in DATABASE_URL
LiteLLM crash-looped on macOS with Prisma `P1013: invalid port number in
database URL`. The shared auth token is generated by `openssl rand -base64`
and can contain '/', '+' or '='; injected raw into the DATABASE_URL
userinfo, a '/' truncates the authority so the port parses as invalid and
proxy startup fails (port 4000 never binds).

Percent-encode the password for the DATABASE_URL only, via an explicit
reserved-set replace chain ('%' first to avoid double-encoding) since
Jinja's urlencode leaves '/' unescaped. The DB user password stays raw in
provision-database and LITELLM_DB_PASSWORD, and the URL form decodes back
to the identical secret (verified round-trip), so authentication is
unchanged. No effect when no DB host is configured.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:56:56 +08:00
Haitao Pan
ef67c61cf7 fix(xfce): skip Linux XFCE/XRDP desktop stack on macOS
The all-in-one flow reached "Update apt cache" in the
xfce_desktop_minimal_runtime role on macOS and failed with
`[Errno 2] No such file or directory: b'update'` (no apt on Darwin).

XFCE + XRDP is a Linux remote-desktop stack and is meaningless on macOS,
which already has a native GUI. Guard both role includes in
setup-xfce-xrdp.yaml with `ansible_os_family != 'Darwin'` so the apt/systemd
tasks never run there. Linux behavior is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:46:31 +08:00
Haitao Pan
6091b9dbcf fix(qmd): pin Homebrew node@24 for build and status on macOS
`qmd status` aborted with ERR_DLOPEN_FAILED — better-sqlite3 was compiled
against NODE_MODULE_VERSION 137 (node@24) but the validate-status task ran
under nvm's Node 20 (NODE_MODULE_VERSION 115), because the user's PATH puts
nvm node ahead of Homebrew and the task pinned no PATH.

Pin `/opt/homebrew/bin` (node@24) ahead of nvm on Darwin for the npm
install, npm build, and validate-status tasks so the native module is
built and loaded against one consistent Node ABI — the same node@24 the
launchd plist already uses. Linux PATH is left unchanged via an
ansible_os_family conditional.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:43:05 +08:00
Haitao Pan
d9033960fd fix(qmd): drop undefined nodejs_version from macOS LaunchAgent PATH
The QMD launchd plist hardcoded an NVM node path
(`~/.nvm/versions/node/{{ nodejs_version }}/bin`), but `nodejs_version` is
never defined in the Homebrew-based macOS deploy, so "Deploy QMD
LaunchAgent" aborted with `AnsibleUndefinedVariable: 'nodejs_version' is
undefined`.

QMD is a bun binary and the Linux user unit already uses
`.bun/bin:.local/bin:...`. Mirror that for the plist PATH and add the
Homebrew prefix (`/opt/homebrew/bin`) for the brew-installed node@24,
removing the nvm/nodejs_version dependency entirely (same remedy as the
console plist in TC-MAC-005).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:32:58 +08:00
Haitao Pan
bbf5260f0d fix(litellm): put venv bin on PATH for prisma generate on macOS
`prisma generate` invokes the `prisma-client-py` generator as a `/bin/sh`
subprocess, which is resolved via PATH. Even though the role calls the
absolute venv `prisma` binary, the generator console script lives in the
same venv bin dir that is not on the default command PATH, so generation
failed with "prisma-client-py: command not found" on macOS.

Add an `environment.PATH` that prepends the venv bin dir (plus Homebrew
prefixes) so the generator subprocess resolves.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:17:12 +08:00
Haitao Pan
ce2070e779 fix(litellm): repair macOS dependency version probe one-liner
The "Inspect installed LiteLLM dependency versions" probe was written as a
multi-line Python program under YAML `>-` folding, which collapses every
newline into a space. The resulting single logical line contained a
`for ... : try: ... except:` block, which is a SyntaxError. With
`failed_when: false` the failure was swallowed, leaving stdout empty, and
the subsequent `set_fact` crashed in `from_json('')` with
"Expecting value: line 1 column 1 (char 0)".

Rewrite the probe as a genuinely single-line program (dict/list
comprehensions over importlib.metadata.distributions(), joined by `;`),
and harden the decision `set_fact` with `default('{}', true)` so an empty
or malformed probe degrades to "install required" instead of aborting the
play.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:16:57 +08:00
f4a30b9e01 fix(litellm): resilient online dependency install
litellm[proxy] pulls large wheels (polars-runtime ~46MB) that break mid-stream over slow/mirrored links with IncompleteRead, failing the deploy. Add pip --retries/--resume-retries (resumes partial downloads) + longer timeout, tunable via litellm_pip_* vars, and upgrade pip in the venv first so --resume-retries (pip>=25.1) exists.
2026-06-22 02:42:51 +00:00
Haitao Pan
6a2f05f435 fix(litellm): skip redundant dependency installs 2026-06-21 22:34:34 +08:00
Haitao Pan
71ebe6444c fix(litellm): isolate runtime in Python 3.13 venv 2026-06-21 21:15:21 +08:00
Haitao Pan
c11f51b4c9 fix(openclaw): allow version-matched acpx plugin 2026-06-21 21:07:21 +08:00
Haitao Pan
f01e0bb15b fix(qmd): provision macOS LaunchAgent 2026-06-21 21:05:59 +08:00
Haitao Pan
09a39e69ee perf(openclaw): avoid unnecessary doctor repairs 2026-06-21 20:54:01 +08:00
Haitao Pan
02667f9e76 Merge remote-tracking branch 'origin/main' 2026-06-21 20:41:41 +08:00
Haitao Pan
65e45a4834 fix(vhosts): make macOS defaults and vault tasks platform aware 2026-06-21 20:41:32 +08:00
Haitao Pan
f231867593 Merge branch 'fix/xworkmate-windows-handler' into HEAD 2026-06-21 20:40:03 +08:00
Haitao Pan
1dd0ce2e04 fix(xworkmate_bridge): use correct Windows command module 2026-06-21 20:38:14 +08:00
Haitao Pan
9f04d4d9b5 perf(nodejs): configure USTC Homebrew mirrors to accelerate installation 2026-06-21 20:20:53 +08:00
Haitao Pan
4f87b67a4e feat(xworkmate_bridge): add Windows Scheduled Task deployment and skip Caddy on Windows 2026-06-21 20:18:11 +08:00
Haitao Pan
51d08cf9db perf(nodejs): disable homebrew auto update to speed up installation 2026-06-21 20:14:03 +08:00
Haitao Pan
aa3b4e8069 fix(gateway_openclaw): resolve npm path resolution, remove obsolete plugins via CLI, and make doctor non-interactive 2026-06-21 20:10:50 +08:00
Haitao Pan
85bad4155f fix(gateway_openclaw): resolve parse error on windows module in handlers 2026-06-21 19:57:37 +08:00
Haitao Pan
48ba854671 fix(litellm): use explicit postgres database for psql commands to prevent connection errors on macOS 2026-06-21 19:56:44 +08:00
Haitao Pan
fa04606542 feat(gateway_openclaw): run doctor --fix --force on restart 2026-06-21 19:54:58 +08:00
Haitao Pan
aedf457ddc feat(gateway_openclaw): add Windows tasks implementation 2026-06-21 19:53:26 +08:00
Haitao Pan
5f7bc697fc fix(playbooks): use include_tasks for windows and force node24 path for openclaw 2026-06-21 19:52:14 +08:00
Haitao Pan
284c3c43a3 feat(ai_agent_runtime): add Windows tasks implementation 2026-06-21 19:48:52 +08:00
Haitao Pan
340de0c4d8 feat(nodejs): add Windows tasks implementation 2026-06-21 19:43:16 +08:00
Haitao Pan
ae9f09d77f fix(openclaw): use 'run' command instead of positional config file in launchd plist 2026-06-21 19:42:13 +08:00
Haitao Pan
a170671ffd fix(nodejs): add --ignore-dependencies to brew uninstall node 2026-06-21 19:32:50 +08:00
Haitao Pan
a35befd123 fix(nodejs): replace community.general.homebrew with shell command to bypass macOS version crash 2026-06-21 19:32:02 +08:00
Haitao Pan
2a8db4f79a fix(nodejs): actually run macOS node installation and fix variable name 2026-06-21 19:30:38 +08:00
Haitao Pan
aa59be12a0 fix(bridge): move macos.yml import before validation 2026-06-21 19:28:09 +08:00
Haitao Pan
4863c327cc fix(macOS): correct bridge and openclaw launchd CLI arguments 2026-06-21 19:22:48 +08:00
Haitao Pan
16ecda9e7d chore: remove temp scripts 2026-06-21 19:08:59 +08:00
Haitao Pan
32da386051 fix(macOS): unify notify strings in xworkmate_bridge and gateway_openclaw to trigger OS-agnostic listen topics rather than Linux-specific handler names 2026-06-21 19:08:43 +08:00
Haitao Pan
8d644f006e fix(macOS): rewrite remaining macOS handlers to use listen attribute instead of notify chaining for proper sequence during flush_handlers 2026-06-21 19:05:33 +08:00
Haitao Pan
f6ef2202b2 fix(macOS): correctly skip caddy and systemd validation tasks in xworkmate_bridge when not enabled on Darwin 2026-06-21 19:01:51 +08:00
Haitao Pan
73e33e3083 fix(macOS): correctly sequence launchctl unload and load by using listen rather than notify in handlers 2026-06-21 18:55:52 +08:00
Haitao Pan
3f1ece8601 refactor(macOS): overhaul handlers to support dynamic OS routing via listen feature 2026-06-21 18:52:37 +08:00
Haitao Pan
68206296f4 fix(macOS): use launchctl unload/load instead of stop/start in handlers to ensure plist updates are applied 2026-06-21 18:48:20 +08:00
Haitao Pan
a5db649802 fix(macOS): use dynamic npm global bin path for codex adapter 2026-06-21 18:47:39 +08:00
Haitao Pan
21e0ab9628 fix(macOS): define acp_hermes_path for launchd plist template 2026-06-21 18:41:02 +08:00
Haitao Pan
04bf000c78 fix(macOS): use dynamic variables instead of hardcoded ubuntu for hermes adapter user and paths 2026-06-21 18:39:07 +08:00
Haitao Pan
ffa357ac4e fix(macOS): add retries to acp validation uri tasks to mitigate launchd race conditions 2026-06-21 18:34:52 +08:00
Haitao Pan
fd142df681 fix(macOS): correct bridge adapter invocation syntax for acp endpoints 2026-06-21 18:27:22 +08:00
321fd81f37 fix(macos): define acp_opencode_path for launchd plist
opencode.plist.j2 exports PATH from acp_opencode_path, but the var was never
defined (codex/gemini define their *_path), causing AnsibleUndefinedVariable on
the 'Create launchd plist template for OpenCode ACP' task. Add
acp_opencode_npm_global_bin + acp_opencode_path mirroring gemini (macOS uses
~/.local/bin + Homebrew + system paths).
2026-06-21 10:13:12 +00:00
8e346889fe fix(macos): gate opencode Caddy conf-dir on manage_caddy
The OpenCode 'Ensure Caddy conf directory exists' task ran unconditionally
(owner root), unlike codex which gates on acp_*_manage_caddy. On macOS (caddy
off by default, manage_caddy=false) it templated the caddy path and tried to
create/chown the dir as root. Gate it on acp_opencode_manage_caddy like codex,
so macOS single-host skips Caddy entirely.
2026-06-21 10:04:52 +00:00
c14962d572 fix: quote strings in caddy_base_dir Jinja (invalid unquoted expr)
caddy_base_dir was {{ /opt/homebrew/etc/caddy if ansible_os_family == Darwin
else /etc/caddy }} with unquoted paths/value, so Jinja parsed '/' as division
-> 'unexpected /' templating error (hit on acp_opencode). Quote the literals:
'{{ "/opt/homebrew/etc/caddy" if ansible_os_family == "Darwin" else "/etc/caddy" }}'
across the 9 affected roles.
2026-06-21 09:55:12 +00:00
e51e07f210 fix(macos): guard acp apt installs to Debian-family with non-empty packages
The opencode/gemini/hermes install tasks ran ansible.builtin.apt unconditionally,
so apt-get update fired even on macOS (no apt) and with empty package lists.
Gate them on 'packages | length > 0' and ansible_os_family == 'Debian' (apt is
Debian-only), matching codex. macOS skips apt entirely.
2026-06-21 09:39:23 +00:00
4ea7adbfa2 fix(macos): portable listener check (ss->lsof fallback) in acp+bridge validate
ss is Linux-only; on macOS it is absent and the validate tasks failed with
[Errno 2] No such file or directory: ss. Use ss when present, else lsof
(-nP -iTCP -sTCP:LISTEN). The acp diagnostic checks are also made non-fatal;
the bridge capture keeps host:port output so its port assertion still matches.
2026-06-21 08:56:26 +00:00
Haitao Pan
cc2e010377 fix(acp): gate caddy validation tasks behind manage_caddy 2026-06-21 16:35:43 +08:00
Haitao Pan
4d63a66cf4 fix(acp): omit owner and group on macOS for codex and opencode bridge extraction 2026-06-21 16:29:32 +08:00
c07874b4d4 feat(caddy): OS-aware caddy_config_dir (Linux /etc/caddy, macOS brew)
Add caddy_config_dir = /etc/caddy on Linux, /opt/homebrew/etc/caddy on macOS.
Derive the Caddyfile / conf.d / fragment paths in the caddy role and the
gateway_openclaw/litellm/xworkmate_bridge roles from it, so a force-enabled
Caddy (caddy_enabled=true) on macOS writes to the Homebrew location instead of
the unwritable /etc/caddy. Default (caddy_enabled=false on macOS) still skips
Caddy entirely.
2026-06-21 16:10:32 +08:00
784f683a3b feat(caddy): gate litellm + bridge Caddy on caddy_enabled
litellm: the Caddy fragment-dir task missed the gate its siblings had; gate it
on caddy_enabled + litellm_caddy_config_enabled. xworkmate_bridge: wrap the
whole Caddy ingress block in caddy_enabled so macOS single-host never touches
/etc/caddy (the bridge service task stays outside the block).
2026-06-21 16:09:46 +08:00
0cfd1af1b7 feat(caddy): gate Caddy behind caddy_enabled (Linux on, macOS off)
Add caddy_enabled (group_vars/all) defaulting to ansible_os_family != 'Darwin',
overridable via -e caddy_enabled=true/false. Wrap the dedicated caddy role and
the gateway_openclaw Caddy ingress block in 'when: caddy_enabled | bool' so
macOS single-host deploys never touch /etc/caddy or start caddy, while Linux
VPS deploys keep Caddy + HTTP/TLS by default. Notifies only fire from gated
tasks, so the Reload caddy handlers stay inert when disabled.
2026-06-21 16:07:33 +08:00
17e2267449 fix(openclaw): retry compile-cache reset to survive rmtree race
On macOS the compile cache is still being written by OpenClaw while ansible
removes it, so shutil.rmtree fails with [Errno 66] Directory not empty. Retry
the deletion (5x, 3s) until the directory is gone.
2026-06-21 16:06:56 +08:00
Haitao Pan
cdbfb2e92a feat(deploy): refactor XWorkmate and OpenClaw deployments to use offline GitHub Release tarballs 2026-06-19 19:50:06 +08:00
Haitao Pan
6aa240c16b fix(permissions): add missing become:true to all cross-platform /usr/local/bin writes 2026-06-19 18:56:47 +08:00
Haitao Pan
a3e570371a fix(acp_server): add missing become:true to xworkmate bridge binary copy tasks 2026-06-19 18:55:25 +08:00
Haitao Pan
f7800111b2 fix(ufw): ensure UFW tasks only run on Debian/Ubuntu and restore missing enable_ufw default 2026-06-19 18:49:06 +08:00
Haitao Pan
7635677dbf fix(caddy): resolve malformed YAML escapes and quotes introduced during caddy refactoring 2026-06-19 18:34:04 +08:00
Haitao Pan
4164e1ff91 refactor(caddy): completely refactor Caddy macOS paths and missing privileges across all roles 2026-06-19 18:22:53 +08:00
Haitao Pan
f66a118c57 fix: resolve Caddy permission denied and macOS path issues in acp_server_codex 2026-06-19 18:16:25 +08:00
Haitao Pan
a0b27a7aee chore: commit pending infra playbook changes including ssh initialization script 2026-06-19 18:09:16 +08:00
Haitao Pan
51565ecf66 fix: resolve nodejs/npm dependency conflict and caddy path/permission issues 2026-06-19 18:08:33 +08:00
Haitao Pan
402c90967a fix: correct acp_server_gemini template name and update nodejs packages for offline installation 2026-06-19 12:25:15 +08:00
Haitao Pan
7613a848a2 fix: stop systemd user status service and timer before cleaning up repo directory to avoid race conditions 2026-06-19 12:16:13 +08:00
Haitao Pan
5facdd5331 fix: remove restrictive version check from stale xworkspace-console repo cleanup task 2026-06-19 12:01:40 +08:00
Haitao Pan
edc70fb658 fix: remove stale repo + depth=1 for clone; macOS browser/npm/agent_skills/role defaults compatibility 2026-06-19 11:37:33 +08:00
Haitao Pan
45f6f3af89 refactor(acp_server_gemini): upgrade to use antigravity-cli 2026-06-19 10:42:24 +08:00
e4aa8affee
Merge pull request #18 from ai-workspace-infra/codex/fix-vault-admin-idempotency
fix: make Vault admin bootstrap idempotent
2026-06-19 09:54:40 +08:00
Haitao Pan
d876d69684 fix: make Vault admin bootstrap idempotent 2026-06-19 09:54:21 +08:00
40395ba0a2
Merge pull request #17 from ai-workspace-infra/codex/retry-openclaw-cache-reset
fix: retry OpenClaw compile cache reset
2026-06-19 09:51:11 +08:00
Haitao Pan
c57642dce2 fix: retry OpenClaw compile cache reset 2026-06-19 09:50:46 +08:00
9ce92f6cf2
Merge pull request #16 from ai-workspace-infra/codex/skip-postgres-common-on-macos
fix: skip common baseline for macOS Postgres
2026-06-19 09:49:03 +08:00
Haitao Pan
06e10fb0e1 fix: skip common baseline for macOS Postgres 2026-06-19 09:48:42 +08:00
d745e26188
Merge pull request #15 from ai-workspace-infra/codex/fix-vault-macos-standalone-paths
fix: use user Vault paths on macOS
2026-06-19 09:46:08 +08:00
Haitao Pan
7c5884c615 fix: use user Vault paths on macOS 2026-06-19 09:45:46 +08:00
c56ac0561c
Merge pull request #14 from ai-workspace-infra/codex/fix-xworkmate-bridge-macos-base-dir
fix: use user bridge directory on macOS
2026-06-19 09:39:51 +08:00
51d28b5d8b fix(postgres): install via brew command on macOS, not homebrew module
The community.general.homebrew module auto-detects a brew prefix and can pick a
stale Intel Homebrew at /usr/local that crashes on newer macOS versions
('unknown or unsupported macOS version'). Use a brew command with the Apple
Silicon prefix first on PATH (matching vault/openclaw), plus
HOMEBREW_NO_AUTO_UPDATE, keeping the task idempotent.
2026-06-18 12:55:51 +00:00
b85a80b8f8 fix(vault): resolve admin entity_id via entity-alias (idempotent bootstrap)
Logging in to obtain entity_id becomes MFA-gated once the login enforcement
exists, so re-runs failed with 'missing entityID'. Look up the entity via its
userpass entity-alias (create entity+alias on first run) and drop the now
unused bootstrap token revoke. Idempotent and backward compatible.
2026-06-18 12:39:42 +00:00
a7ad856e05 fix(common): macOS (Darwin) compatibility for baseline
The Base hardening tasks (timedatectl timezone, /etc/hostname, hostname,
/etc/hosts, ssh hardening, fail2ban, file limits, firewall) use become: true
and Linux-only tooling, so they fail on macOS where the deploy is unprivileged
(timedatectl is also absent). Guard the whole Base block with
ansible_os_family != 'Darwin'.

Add a Common | Darwin baseline branch (common_darwin.yml) that installs shared
Homebrew CLI prerequisites (jq) used by helper scripts in other roles, e.g.
vault's init_vault_admin.sh. Packages are listed in common_darwin_brew_packages.
2026-06-18 12:12:17 +00:00
Haitao Pan
7dcd2307ea fix: use user bridge directory on macOS 2026-06-18 18:11:31 +08:00
de143241c8
Merge pull request #13 from ai-workspace-infra/codex/fix-acp-macos-path-defaults
fix: add macOS ACP path defaults
2026-06-18 18:08:44 +08:00
Haitao Pan
33ef20e064 fix: add macOS ACP path defaults 2026-06-18 18:08:14 +08:00
7938e485a5
Merge pull request #12 from ai-workspace-infra/codex/omit-acp-binary-chown-on-macos
fix: omit ACP binary chown on macOS
2026-06-18 18:04:43 +08:00
Haitao Pan
7e886ec009 fix: omit ACP binary chown on macOS 2026-06-18 18:04:19 +08:00
014cc06824
Merge pull request #11 from ai-workspace-infra/codex/guard-linux-acp-tasks-on-macos
[codex] Guard Linux ACP tasks on macOS
2026-06-18 17:52:45 +08:00
Haitao Pan
740b0a5e72 fix: guard Linux ACP tasks on macOS 2026-06-18 17:52:21 +08:00
15ffca6368
Merge pull request #10 from ai-workspace-infra/codex/skip-openclaw-caddy-on-macos
[codex] Skip OpenClaw Caddy tasks on macOS
2026-06-18 17:49:41 +08:00
Haitao Pan
091cd1bfc1 fix: skip OpenClaw Caddy tasks on macOS 2026-06-18 17:49:23 +08:00
3d03134a62
Merge pull request #9 from ai-workspace-infra/codex/restore-macos-launchd-roles
[codex] Preserve macOS launchd service roles
2026-06-18 17:46:40 +08:00
Haitao Pan
bd6de2ba7b fix: preserve macOS launchd service roles 2026-06-18 17:46:24 +08:00
735132ea25
Merge pull request #8 from ai-workspace-infra/codex/fix-macos-native-deployment
[codex] Fix native macOS deployment
2026-06-18 17:45:48 +08:00
Haitao Pan
4ca20c8603 fix: support native macOS deployment 2026-06-18 17:45:01 +08:00
Haitao Pan
e83c1a73ac fix: replace hardcoded ubuntu user/group/home in 6 vhost role defaults for macOS 2026-06-18 16:57:38 +08:00
Haitao Pan
39bdc7c1fd fix: make agent CLI version check non-fatal (codex optional dep on macOS) 2026-06-18 16:53:53 +08:00
Haitao Pan
8a62cc4e59 fix: use argv for chromium version check to handle paths with spaces 2026-06-18 16:49:18 +08:00
Haitao Pan
4a60ff30e4 fix: make agent_skills defaults cross-platform (HOME, user, group) 2026-06-18 16:46:16 +08:00
Haitao Pan
a3fd2679ef fix: skip apt rsync installation on macOS (rsync is built-in) 2026-06-18 16:44:01 +08:00
Haitao Pan
74d027e649 fix: ensure env directory exists before writing playwright configuration on macOS 2026-06-18 16:32:31 +08:00
Haitao Pan
aabf296461 fix: disable apt and become for browser setup on macOS 2026-06-18 16:30:04 +08:00
Haitao Pan
044a264256 feat: full macOS (Darwin) compatibility fixes for Ansible playbooks 2026-06-18 16:26:51 +08:00
Haitao Pan
c7784f2063 fix: restart opencode acp through launchd 2026-06-18 14:51:40 +08:00
Haitao Pan
5e3db5dfd5 feat: run opencode acp with launchd on macos 2026-06-18 14:51:06 +08:00
Haitao Pan
75c4c98613 feat: run codex acp with launchd on macos 2026-06-18 14:50:14 +08:00
Haitao Pan
2946a7bc42 fix: route codex acp setup on macos 2026-06-18 14:50:00 +08:00
Haitao Pan
dbbce5ff49 feat: support macos runtime deployment 2026-06-18 14:48:04 +08:00
Haitao Pan
0e1f8ab7cf fix: install openclaw multi-session plugin 2026-06-18 10:01:51 +08:00
Haitao Pan
532c57a359 fix(offline): skip online repos for docker/nodejs and add ubuntu 26.04 support 2026-06-17 20:43:16 +08:00
Haitao Pan
c1162f7ea2 fix(qmd): configure LiteLLM embedding gateway and inject auth token 2026-06-17 14:43:34 +08:00
Haitao Pan
13d986a078 feat(ai-workspace): add Vault KV secrets dump and restore 2026-06-17 14:09:12 +08:00
Haitao Pan
5e363249ce feat(ai-workspace): add encrypted backup and restore playbooks 2026-06-17 14:05:06 +08:00
Haitao Pan
1ac560e482 feat(ai-workspace): add backup/restore/migration role and playbook 2026-06-17 13:59:49 +08:00
Haitao Pan
b36a1c44e5 fix(firewall): allow ssh http https ingress 2026-06-17 13:59:49 +08:00
Haitao Pan
e5991301c6 feat(ai): parameterize LiteLLM URL and models for gateway_openclaw and acp_server_hermes to avoid hardcoded ports 2026-06-17 06:45:06 +08:00
Haitao Pan
3809a8cb6b feat(ai): configure Hermes and OpenClaw to safely connect to local LiteLLM API endpoint by default using AI_WORKSPACE_AUTH_TOKEN 2026-06-16 23:19:30 +08:00
Haitao Pan
596f52ba12 fix(litellm): revert DEEPSEEK_API_KEY fallback to litellm_master_key 2026-06-16 23:10:54 +08:00
Haitao Pan
d49b472ddb fix(litellm): add DEEPSEEK_API_KEY and OPENAI_API_KEY to litellm environment variables 2026-06-16 23:01:19 +08:00
Haitao Pan
93cbe2cd1b feat: allow /ui* and /health in caddy allowed_api for minimal gateway mode 2026-06-16 16:51:58 +08:00
Haitao Pan
5630df788a fix: make ai runtime npm installs idempotent 2026-06-16 15:04:14 +08:00
Haitao Pan
7936f65485 Fix git safe.directory for console prefetch 2026-06-16 09:24:44 +08:00
Haitao Pan
1c6ebc36ba docs: pin runtime asset names 2026-06-15 22:02:52 +08:00
Haitao Pan
c07d12b5fe feat: consume prebuilt workspace runtimes 2026-06-15 21:58:50 +08:00
Haitao Pan
e4b04f95fe feat(xrdp): provision and enforce standard user login instead of root 2026-06-15 18:41:15 +08:00
Haitao Pan
d92979f22d fix(litellm): ensure config directory and users exist before provisioning database 2026-06-15 18:31:54 +08:00
Haitao Pan
2658727d19 feat: increase ClientAliveCountMax to 15 2026-06-15 18:13:55 +08:00
Haitao Pan
dcf49e4ebf feat: configure SSH ClientAlive settings for persistent sessions 2026-06-15 18:07:12 +08:00
Haitao Pan
ba4ef489aa chore: ignore gitleaks false positive in docs 2026-06-15 18:02:37 +08:00
Haitao Pan
126a19e282 feat(security): add SSH hardening, fail2ban tasks, connection check helper, and doc 2026-06-15 17:50:00 +08:00
Haitao Pan
c627f016bf fix: move ACP service checks to final validation phase 2026-06-15 16:59:03 +08:00
Haitao Pan
5f00409550 fix: correct npm global bin path for acp_server_codex 2026-06-15 16:36:12 +08:00
Haitao Pan
40ed86a070 feat: deliver versioned AI Workspace Runtime (role split, run-mode matrix, bridge domain) 2026-06-15 16:12:37 +08:00
Haitao Pan
178664f262 feat: allow a portable LiteLLM Python runtime 2026-06-15 15:44:52 +08:00
Haitao Pan
2243b5d0c8 fix: support LiteLLM on Debian 11 2026-06-15 15:36:20 +08:00
Haitao Pan
65aef78937 fix: trust NodeSource armored signing key 2026-06-15 15:16:06 +08:00
Haitao Pan
2f4d3ad930 fix: make offline runtime reprovisioning stable 2026-06-15 15:12:56 +08:00
Haitao Pan
39dbb7b5f9 feat: allow packaged console source 2026-06-15 14:43:24 +08:00
Haitao Pan
3793143466 fix: wait safely for apt locks 2026-06-15 14:32:24 +08:00
Haitao Pan
437d50c095 docs: add offline install todo 2026-06-15 13:52:16 +08:00
Haitao Pan
981d83acab docs: add deployment todo checklist 2026-06-15 10:42:48 +08:00
Haitao Pan
4228c1a6df fix: correct docker repository task yaml 2026-06-14 14:19:42 +08:00
Haitao Pan
cfe89432a1 fix: allow pinned nodejs runtime downgrades 2026-06-14 13:50:05 +08:00
Haitao Pan
645ac9bd17 fix: support Debian runtime deployment paths 2026-06-14 13:47:26 +08:00
Haitao Pan
3084ab7940 feat: deliver versioned AI Workspace Runtime 2026-06-14 13:19:44 +08:00
Haitao Pan
f15c384a34 fix: provision local litellm db and qmd fallback 2026-06-14 11:25:28 +08:00
Haitao Pan
6346684af5 fix: support standalone postgres and dynamic litellm path 2026-06-14 11:09:52 +08:00
Haitao Pan
bfb6b17e29 fix: run standalone vault without inventory group 2026-06-14 10:54:22 +08:00
Haitao Pan
2319c592fb feat: support standalone vault deployment 2026-06-14 10:42:41 +08:00
Haitao Pan
41853eedd9 fix: allow bridge validation url override 2026-06-14 10:30:06 +08:00
Haitao Pan
5e359cc5d8 fix: resolve openclaw user uid dynamically 2026-06-14 10:16:27 +08:00
Haitao Pan
4b6b1de8a7 fix: reload openclaw user systemd bus 2026-06-14 10:08:22 +08:00
Haitao Pan
0b344b5bd0 fix: deploy openclaw before bridge validation 2026-06-14 10:02:26 +08:00
Haitao Pan
ae78231fac fix: bootstrap hermes acp shim 2026-06-14 09:54:43 +08:00
Haitao Pan
cd2d4b0046 fix: install caddy for workspace routes 2026-06-14 09:46:55 +08:00
Haitao Pan
7f6854e9de fix: sync agent skills over local connection 2026-06-14 09:33:58 +08:00
Haitao Pan
4c330b7e1c fix: install go for workspace api 2026-06-14 09:27:59 +08:00
Haitao Pan
a15016ef1f feat: install agent cli toolchain 2026-06-14 09:25:30 +08:00
Haitao Pan
e2ae564745 feat: unify ai workspace deployment auth 2026-06-14 09:09:40 +08:00
Haitao Pan
4b7c52057d chore: unify xworkspace console service 2026-06-13 07:43:11 +08:00
Haitao Pan
f3ab617db6 docs: update bootstrap script URL to point to xworkspace-console repo 2026-06-12 19:47:39 +08:00
Haitao Pan
cc41ff61db chore: move bootstrap script to xworkspace-console repo 2026-06-12 19:47:16 +08:00
Haitao Pan
604132e604 chore: move setup-ai-workspace-all-in-one.sh to scripts directory 2026-06-12 19:45:59 +08:00
Haitao Pan
c784b621f6 fix: add force=true to litellm systemd symlink to allow out-of-order creation 2026-06-12 19:33:43 +08:00
Haitao Pan
1f7d85b35d fix: patch tsconfig.json to ES2022 to support Array.at() during dashboard build 2026-06-12 19:27:48 +08:00
Haitao Pan
60269ee222 fix: replace local rsync with git clone for xworkspace-console dashboard to support public bootstrap scripts 2026-06-12 19:25:44 +08:00
Haitao Pan
74b3411336 feat: auto-generate or reuse DEPLOY_TOKEN for local ansible vault 2026-06-12 19:20:12 +08:00
Haitao Pan
811b17962b feat: add bootstrap script setup-ai-workspace-all-in-one.sh for curl|bash deployment 2026-06-12 19:18:05 +08:00
Haitao Pan
56b33a3231 docs: update setup-ai-workspace-all-in-one.md TLDR and params 2026-06-12 16:47:46 +08:00
Haitao Pan
f424327cfb feat: add public_access control to xworkspace-console 2026-06-12 15:31:24 +08:00
Haitao Pan
affd6827b0 docs: add TLDR section to setup-ai-workspace-all-in-one.md 2026-06-12 14:37:21 +08:00
Haitao Pan
7d1a86e412 docs: add setup-ai-workspace-all-in-one deployment guide and security notes 2026-06-12 14:36:10 +08:00
Haitao Pan
944d59f911 feat: standardise public_access controls across roles and introduce global security_level 2026-06-12 14:31:25 +08:00
Haitao Pan
6d6a3a8593 fix: correct yaml formatting in host_vars litellm.yml 2026-06-12 13:03:28 +08:00
Haitao Pan
b8d4df9230 docs: rename var to litellm_api_caddy_strict_whitelist and update documentation 2026-06-12 09:44:24 +08:00
Haitao Pan
1574287a4d feat: add litellm_api_caddy_public_access variable to control Caddy proxy behavior 2026-06-12 09:39:45 +08:00
Haitao Pan
e9dec70225 docs: relax Caddy routing to allow LiteLLM UI backend API calls 2026-06-12 09:36:03 +08:00
Haitao Pan
e3952916af docs: reformat litellm deployment guide to complement readme 2026-06-12 09:21:37 +08:00
Haitao Pan
47d4931ff7 docs: update litellm README to Minimal AI API Gateway spec and clean up config 2026-06-12 09:11:12 +08:00
Haitao Pan
7ef5005ae1 refactor(litellm): remove hardcoded provider API keys from defaults and env templates 2026-06-12 09:08:33 +08:00
Haitao Pan
9196625bd0 feat(litellm): enable STORE_MODEL_IN_DB to allow UI model management 2026-06-11 22:46:22 +08:00
Haitao Pan
a076370b68 security(litellm): move plain text master key to vault encrypted host_vars 2026-06-11 22:45:18 +08:00
Haitao Pan
21cbbca9be fix(litellm): use UI_USERNAME and UI_PASSWORD env vars instead of LITELLM_ prefixed 2026-06-11 22:33:35 +08:00
Haitao Pan
c22a8c8266 feat(litellm): serve UI on api domain and clear default model lists 2026-06-11 21:45:10 +08:00
Haitao Pan
cdf06da6d9 chore: add .gitleaksignore to whitelist false positive public keys from git history 2026-06-11 18:36:56 +08:00
Haitao Pan
a77d2fedfb refactor(litellm): use ansible vault for database password 2026-06-11 18:32:55 +08:00
Haitao Pan
b4ebecc32d refactor(litellm): remove hardcoded database password and use env lookup instead 2026-06-11 18:31:02 +08:00
Haitao Pan
629016185d chore: add gitleaks ignore for public wireguard keys 2026-06-11 18:29:35 +08:00
Haitao Pan
96ad38ff14 fix(litellm): disable Caddy basic auth and remove manual schema application to avoid migration conflicts 2026-06-11 18:28:18 +08:00
Haitao Pan
c1cb19b59b fix(litellm): add PATH to systemd unit to expose prisma-client-py 2026-06-11 17:29:07 +08:00
Haitao Pan
1d8516d160 fix(litellm): add PYTHONPATH to systemd unit, grant all table/sequence permissions to litellm DB user 2026-06-11 17:21:19 +08:00
Haitao Pan
72763856d3 fix(litellm): pin stable DB password in host_vars to prevent random password drift between templates 2026-06-11 17:14:09 +08:00
Haitao Pan
9cde355688 fix(litellm): sslmode=disable for localhost Docker PG, remove environment_variables override from config.yaml 2026-06-11 17:09:49 +08:00
Haitao Pan
e6a3d95578 fix(litellm): install prisma client and generate prisma bindings correctly during deployment 2026-06-11 16:45:22 +08:00
Haitao Pan
814a81f088 feat(litellm): support dynamic master key via extra vars and generate caddy bcrypt hash on the fly 2026-06-11 16:33:17 +08:00
Haitao Pan
ed8a78e932 fix(litellm): configure actual master key and basic auth hash 2026-06-11 16:26:59 +08:00
Haitao Pan
d5a17a8301 fix(litellm): allow access to root path on ui domain instead of returning 404 2026-06-11 16:15:06 +08:00
Haitao Pan
01af16cd54 fix(litellm): use docker exec for pg provisioning 2026-06-11 16:14:03 +08:00
Haitao Pan
a68cf68d14 feat(litellm): restore secure automated DB provisioning using raw sudo psql 2026-06-11 16:09:12 +08:00
Haitao Pan
d57ef6458d chore(litellm): skip automated db provisioning due to missing superuser password 2026-06-11 15:57:25 +08:00
Haitao Pan
4a14572b5b fix(litellm): revert become_user to local TCP password auth 2026-06-11 15:56:43 +08:00
Haitao Pan
e2b7f0366c chore: enable allow_world_readable_tmpfiles to allow postgres db provisioning 2026-06-11 15:55:54 +08:00
Haitao Pan
fc7a23617c fix(litellm): use become_user postgres for db provisioning 2026-06-11 15:50:51 +08:00
Haitao Pan
fc1bff0061 fix(litellm): bypass stunnel and use port 5432 for local DB provisioning 2026-06-11 15:47:09 +08:00
Haitao Pan
db9d564ef3 fix(litellm): install psycopg2 before provisioning db 2026-06-11 15:35:11 +08:00
Haitao Pan
d573a4651b fix(litellm): remove delegate_to 127.0.0.1 in provision-database 2026-06-11 15:33:51 +08:00
Haitao Pan
ce6d970bda feat(litellm): separate api/ui caddy fragments, add models, secure db with sslmode 2026-06-11 15:29:31 +08:00
Haitao Pan
a817a0e732 fix(litellm): install litellm[proxy] to get all deps incl websockets 2026-06-11 11:42:16 +08:00
Haitao Pan
e56cb63032 fix(litellm): add PYTHONPATH env and fix websockets dep for litellm service 2026-06-11 11:41:29 +08:00
Haitao Pan
e5efac92e4 feat: add litellm gateway deployment playbook and role 2026-06-11 10:05:42 +08:00
Haitao Pan
2252d24708 fix: kill legacy http.server, reload systemd and start services after deploy 2026-06-09 20:40:46 +08:00
a421eb2e4f
xworkspace-portal.service: serve dashboard on port 17000 (#7)
* xworkspace-portal.service: use dashboard on port 17000

- Change portal service from python http.server:7000 to npm dev server:17000
- Update chrome app launcher to use port 17000
- Add dashboard source sync and npm install tasks
- Update portal URL and port variables

* production: build dashboard on target with npm run build, serve with npm run preview

- Add xworkspace_console_dashboard_local_src variable for local dashboard path
- Sync dashboard source from controller to target via rsync
- Build dashboard with npm install && npm run build on target
- Serve production build with npm run preview instead of dev
- Copy dist/ and package.json to portal directory for preview server

---------

Co-authored-by: Haitao Pan <manbuzhe2009@qq.com>
2026-06-09 19:54:24 +08:00
Haitao Pan
1414fe588f production: build dashboard on target with npm run build, serve with npm run preview
- Add xworkspace_console_dashboard_local_src variable for local dashboard path
- Sync dashboard source from controller to target via rsync
- Build dashboard with npm install && npm run build on target
- Serve production build with npm run preview instead of dev
- Copy dist/ and package.json to portal directory for preview server
2026-06-09 19:46:11 +08:00
Haitao Pan
b58a74892c xworkspace-portal.service: use dashboard on port 17000
- Change portal service from python http.server:7000 to npm dev server:17000
- Update chrome app launcher to use port 17000
- Add dashboard source sync and npm install tasks
- Update portal URL and port variables
2026-06-09 19:28:00 +08:00
Haitao Pan
aee1f2b5d5 Ensure workspace user exists before provisioning 2026-06-08 18:40:26 +08:00
Haitao Pan
466cb29a1b Install ttyd from official release binary 2026-06-08 18:21:26 +08:00
Haitao Pan
b6d18f5944 Install Chrome source before workspace runtime packages 2026-06-08 18:13:53 +08:00
Haitao Pan
42b8443f91 Allow common HTTP and HTTPS ports 2026-06-08 17:43:53 +08:00
Haitao Pan
8c71b27112 feat(playbook): add xworkspace console runtime setup 2026-06-08 07:14:24 +08:00
Haitao Pan
7e0dc61924 fix: preserve xworkmate bridge review token in ingress 2026-06-07 23:01:47 +08:00
Haitao Pan
f451b5cd20 fix(playbook): move openclaw session contract checks out of deploy validation
The OpenClaw session contract smoke and SSE long-task stream checks lived in
roles/vhosts/xworkmate_bridge/tasks/validate.yml and ran during the Deploy
stage. They depend on the public OpenClaw gateway producing a 'pong' reply,
which the deployed bridge cannot guarantee end-to-end. When the gateway
returns an empty completion envelope, the entire Deploy job fails after the
bridge binary has already been installed and is healthy.

Move these checks to the GitHub Actions validate stage in xworkmate-bridge
where they belong. The bridge's own deploy validation now only asserts the
bridge's own state (Caddy config, systemd unit, ports, /api/ping, /acp/rpc
capabilities, routing.resolve).
2026-06-05 19:28:38 +08:00
Haitao Pan
6c234f9544 fix(playbook): update openclaw smoke tests to poll for async task completion 2026-06-04 14:48:31 +08:00
Haitao Pan
6d3418284a fix(playbook): adjust system-level xworkmate-bridge.service to run as ubuntu user and ensure the user exists 2026-06-04 14:36:24 +08:00
Haitao Pan
d7199c511b fix(playbook): stop, disable, and clean up obsolete user-level xworkmate-serve service to prevent port 8787 conflicts 2026-06-04 14:30:13 +08:00
Haitao Pan
61eb40624d fix(xworkmate_bridge): resolve config.yaml PermissionError during deployment caused by immutable flag 2026-06-04 11:48:09 +08:00
Haitao Pan
dcdc9bea7b feat: Remote Desktop Ansible Deployment for xworkmate-bridge 2026-06-03 10:49:49 +08:00
Haitao Pan
2f2e9d8f9b fix: pin OpenClaw Codex plugin 2026-06-01 14:53:18 +08:00
Haitao Pan
ba4daa3597 fix: align bridge OpenClaw protocol 4 deployment 2026-06-01 13:48:52 +08:00
Haitao Pan
402faa02e1 fix: validate bridge token consistency 2026-06-01 10:02:13 +08:00
Haitao Pan
ce0dd3cee1 Wire review bridge token deployment 2026-05-30 10:34:51 +08:00
Haitao Pan
e3921518ba Unify AI agent runtime deployment 2026-05-26 14:11:01 +08:00
Haitao Pan
003d48e748 Merge branch 'codex/acp-connection-closed-cleanup' 2026-05-26 13:56:22 +08:00
Haitao Pan
69e7691287 chore: align AI agent runtime playbooks 2026-05-26 12:58:56 +08:00
Haitao Pan
71e3449622 Use SSE curl for OpenClaw validation 2026-05-26 11:29:25 +08:00
Haitao Pan
805a3fbda9 Focus bridge validation on OpenClaw RPC 2026-05-26 11:26:21 +08:00
Haitao Pan
22662cc538 Validate OpenClaw through bridge RPC 2026-05-26 11:06:22 +08:00
Haitao Pan
7fbba293a0 Fix Hermes deploy validation status check 2026-05-23 16:04:50 +08:00
Haitao Pan
f51958a4a2 chore: set xworkmate bridge openclaw active budget to five 2026-05-22 19:13:26 +08:00
Haitao Pan
aa674a7dac fix: serialize xworkmate bridge openclaw tasks 2026-05-22 19:10:31 +08:00
Haitao Pan
9765158371 fix: validate ebook over public HTTPS 2026-05-20 16:35:46 +08:00
Haitao Pan
5ff5e2f1eb fix: validate ebook vhost over local TLS 2026-05-20 16:35:03 +08:00
Haitao Pan
dfad2a0a5c fix: use Caddy conf.d for ebook vhost 2026-05-20 16:34:30 +08:00
Haitao Pan
cd131e79f4 fix: keep ebook deploy on Node 24 hosts 2026-05-20 16:28:43 +08:00
Haitao Pan
29dd6a38b7 feat: deploy modern IT history ebook 2026-05-20 16:27:54 +08:00
Haitao Pan
ae1e5813a9 fix: allow OpenClaw bridge validation to finish 2026-05-18 17:53:55 +08:00
Haitao Pan
4b2ab8401b Align XFCE XRDP browser setup with Chrome deb 2026-05-18 05:42:17 +08:00
Haitao Pan
72bee745b3 tune openclaw default thinking for gateway tasks 2026-05-15 12:29:01 +08:00
Haitao Pan
0c3e673e78 fix openclaw gateway default model deploy config 2026-05-15 12:10:31 +08:00
Haitao Pan
07f72e2c46 Relax bridge SSE keepalive validation 2026-05-11 14:45:27 +08:00
Haitao Pan
ad49ba1b22 Configure OpenClaw admission through bridge config 2026-05-11 13:21:41 +08:00
Haitao Pan
b6b0e3ddad Use OpenClaw default agent model 2026-05-11 12:53:39 +08:00
Haitao Pan
3ae95ea54d Enable production OpenClaw artifact plugin 2026-05-11 12:35:09 +08:00
Haitao Pan
6c1ad92ff4 Handle live OpenClaw gateway runtime path 2026-05-11 12:14:31 +08:00
Haitao Pan
f023bd3961 Configure stable OpenClaw concurrency 2026-05-11 11:47:09 +08:00
Haitao Pan
95efae0060 Configure stable OpenClaw concurrency 2026-05-11 11:45:32 +08:00
Haitao Pan
1fa9ca2457 fix: validate OpenClaw SSE ingress 2026-05-08 18:58:51 +08:00
Haitao Pan
9f3449b635 fix: proxy xworkmate artifact downloads 2026-05-06 10:05:09 +08:00
Haitao Pan
289468e188 fix: remove legacy acp-server ingress contract 2026-05-03 12:31:07 +08:00
Haitao Pan
a50dc24619 fix: align xworkmate bridge ingress contract 2026-05-03 12:14:27 +08:00
Haitao Pan
e2bbc56e7a fix: keep bridge deploy from mutating openclaw 2026-05-03 11:30:58 +08:00
Haitao Pan
dd0201e483 fix: expose bridge gateway ingress 2026-05-03 11:22:09 +08:00
Haitao Pan
d3efb08e8d chore: submit remaining playbooks changes 2026-05-02 19:41:38 +08:00
Haitao Pan
54b234b2bc fix: reload bridge unit before service start 2026-05-02 19:17:34 +08:00
Haitao Pan
a250cf70e5 fix: remove root openclaw dependency from bridge unit 2026-05-02 19:06:58 +08:00
Haitao Pan
f6167c1e89 fix: run openclaw gateway as user service 2026-05-02 18:51:46 +08:00
Haitao Pan
14c77e6e5e fix: propagate bridge image ref into systemd 2026-05-02 18:20:30 +08:00
Haitao Pan
3d091118c2 fix: retry bridge hermes diagnostic validation 2026-05-02 18:11:17 +08:00
Haitao Pan
9ba79fb05a fix: recover openclaw ollama secret from host env 2026-05-02 17:57:43 +08:00
Haitao Pan
fd9d42b9a5 fix: validate systemd native xworkmate bridge stack 2026-05-02 12:10:08 +08:00
Haitao Pan
d08987120a fix: reload OpenClaw systemd unit before validation 2026-04-30 12:43:53 +08:00
Haitao Pan
176aaf8fcf fix: preserve existing OpenClaw secrets 2026-04-30 12:31:58 +08:00
Haitao Pan
b40003b66d fix: decouple bridge deploy from runtime bootstrap 2026-04-30 12:24:39 +08:00
Haitao Pan
e1dc41e54f fix: disable runtime skills sync for bridge deploy 2026-04-30 12:15:58 +08:00
Haitao Pan
c5f17b1c92 fix: decouple bridge deploy from local skills sync 2026-04-30 12:07:43 +08:00
Haitao Pan
1af963699a fix: avoid external collection for skills sync 2026-04-30 12:05:41 +08:00
Haitao Pan
59a7e6be4d fix: wait for bridge dependency services 2026-04-30 12:02:54 +08:00
Haitao Pan
184a200c40 refactor: improve auth token handling and dynamic configurations
- Dynamically resolve Chromium executable path in ai_agent_runtime.
- Read existing auth tokens from systemd for hermes and xworkmate_bridge.
- Fix yarn gpg key extension in nodejs role.
- Support force install flag in agent_skills.
- Remove openclaw gateway from xworkmate_bridge role.
- Add .playwright-mcp/ to .gitignore.
2026-04-30 11:55:34 +08:00
Haitao Pan
fa98d41b64 feat: add standalone OpenClaw gateway deploy 2026-04-29 19:35:24 +08:00
Haitao Pan
5f1f765660 test: validate hermes empty response contract 2026-04-29 19:27:42 +08:00
Haitao Pan
db60aa1ddf Add scenario skill bootstrap to agent skills role 2026-04-29 11:25:37 +08:00
Haitao Pan
aa2b2e0f2d Update xfce xrdp session docs and template 2026-04-28 18:49:19 +08:00
Haitao Pan
3bf305e793 Add AI agent runtime and shared skills roles 2026-04-28 18:46:01 +08:00
Haitao Pan
966cc16b7f Stabilize XWorkmate ACP service units 2026-04-27 12:31:42 +08:00
Haitao Pan
ce56e0374b Align bridge Caddy validation with preserved paths 2026-04-26 10:51:47 +08:00
Haitao Pan
5318fc28bd Manage OpenClaw gateway as foreground service 2026-04-26 10:49:44 +08:00
Haitao Pan
5e6477e64c Keep bridge validation in bridge role 2026-04-26 10:41:58 +08:00
Haitao Pan
e0769d32bc Preserve bridge RPC paths in Caddy 2026-04-26 10:39:35 +08:00
Haitao Pan
7422c9d41f Run OpenCode through ACP adapter 2026-04-26 10:26:15 +08:00
Haitao Pan
bd3624b77b Deploy xworkmate bridge via systemd 2026-04-26 10:17:38 +08:00
Haitao Pan
92322833d2 Stop standalone bridge before compose deploy 2026-04-24 15:39:29 +08:00
Haitao Pan
ef2f77837f Preserve immutable bridge Caddy fragment 2026-04-24 15:29:45 +08:00
Haitao Pan
4dde19987a Avoid provider execution in bridge route validation 2026-04-24 15:20:12 +08:00
Haitao Pan
f480dc633b Fix xworkmate adapter deployment commands 2026-04-24 15:10:33 +08:00
Haitao Pan
515ba95c75 feat(gpu_inference): add comprehensive GPU inference infrastructure with Sealos, Ray, and vLLM 2026-04-23 19:17:23 +08:00
Haitao Pan
413d46995b Align xworkmate bridge validation with ACP ingress 2026-04-22 00:04:54 +08:00
Haitao Pan
c478863b74 fix(xworkmate_bridge): fix container reachability and auth token mismatch 2026-04-21 18:02:30 +08:00
Haitao Pan
827d78543a fix(deploy): replace fragile curl ping validation with native uri module 2026-04-21 16:34:05 +08:00
Haitao Pan
747426eb25 Harden xworkmate bridge ping validation 2026-04-21 15:23:49 +08:00
Haitao Pan
73bb2822fd chore(deploy): reduce ping validation retries to 3 2026-04-21 14:25:55 +08:00
Haitao Pan
cb4a4bc023 fix(deploy): improve bridge validation robustness and align gateway paths 2026-04-21 14:18:57 +08:00
Haitao Pan
99ca8b4ee8 fix(deploy): clean up gemini environment and force remove bridge container on deploy 2026-04-21 13:49:36 +08:00
Haitao Pan
b1276eee71 Consolidate bridge deploy to docker role only 2026-04-21 11:00:05 +08:00
Haitao Pan
d375eab837 Fix OpenCode ACP validation marker default 2026-04-21 10:03:40 +08:00
Haitao Pan
746b9407ff Handle immutable ACP service unit uploads 2026-04-20 18:55:07 +08:00
Haitao Pan
3f0e21d237 Handle immutable bridge binary uploads 2026-04-20 18:19:07 +08:00
Haitao Pan
ae5f7c5b4e Align xworkmate bridge playbooks with live services 2026-04-20 17:20:03 +08:00
Haitao Pan
acfe7f564d feat(xfce): refactor XFCE role into install and config tasks, and fix session setup
- Split XFCE minimal role into install.yml and config.yml for better modularity.
- Restore .xsession setup with NO_BROWSER=true and exec startxfce4.
- Add support for managing user groups and shell.
- Ensure XRDP services are active and enabled on jp-xhttp-contabo.svc.plus.
2026-04-20 10:53:35 +08:00
Haitao Pan
f20980bdc0 fix(bridge): allow public access to /api/ping and update ACP validation URLs
- Exempt `/` and `/api/ping` from Bearer token authentication in xworkmate-bridge Caddyfile to fix health check failures (401 Unauthorized).
- Update validation tasks to use `https://{{ xworkmate_bridge_domain }}` instead of `http://127.0.0.1`.
- Correct the upstream ACP paths in validation logic (e.g. `/acp-server/codex`).
- Remove redundant Host headers from validation requests.
2026-04-18 17:01:12 +08:00
Haitao Pan
5fa35235e1 refactor(acp): reorganize ACP roles and unify ingress under xworkmate-bridge
- Rename acp_codex, acp_gemini, acp_opencode roles to acp_server_*
- Consolidate ACP deployment logic into xworkmate_bridge role
- Introduce gateway_openclaw role for ingress management
- Update playbooks to use the refactored xworkmate_bridge role
- Unify domain and upstream configuration under xworkmate-bridge.svc.plus
2026-04-18 14:30:39 +08:00
Haitao Pan
ae1d318332 feat(bridge): templatize runtime configuration and add deployment tasks for xworkmate_bridge role 2026-04-18 12:17:32 +08:00
Haitao Pan
cd92dbc20d chore(domain): complete migration from acp-server.svc.plus to unified xworkmate-bridge.svc.plus 2026-04-18 11:42:57 +08:00
Haitao Pan
1cbe937178 refactor(summary): update deployment summary URLs to match unified bridge paths 2026-04-18 11:37:44 +08:00
Haitao Pan
c82c93d9ff fix(validate): update Caddy fragment path and remove stale file checks 2026-04-18 11:16:53 +08:00
Haitao Pan
74384140e2 refactor(validate): use global xworkmate_bridge_auth_token variable for authentication headers 2026-04-18 10:33:08 +08:00
Haitao Pan
e1a29dc4a0 fix(validate): add Authorization header to bridge and acp ingress checks 2026-04-18 10:31:52 +08:00
Haitao Pan
26499f5602 Add docs.svc.plus deployment playbook 2026-04-14 18:21:01 +08:00
Haitao Pan
c0f1a1c2ee Deploy billing-service from build artifact 2026-04-12 19:05:17 +08:00
Haitao Pan
97d49eaf39 deploy: pass bridge upstream token into runtime 2026-04-12 18:52:53 +08:00
Haitao Pan
27e19c4457 deploy: validate bridge ping over public https 2026-04-12 18:47:33 +08:00
Haitao Pan
9cc0e6bfb8 deploy: allow minimal caddy base config 2026-04-12 18:23:01 +08:00
Haitao Pan
220203b133 deploy: align console ingress and dns contract 2026-04-12 18:14:28 +08:00
Haitao Pan
04fb63881c fix accounts service ghcr login 2026-04-12 17:57:40 +08:00
80c545a95c
Merge pull request #6 from x-evor/codex/multi-node-billing-ingestion
Codex/multi node billing ingestion
2026-04-12 15:56:26 +08:00
Haitao Pan
130932fc6f deploy: run xworkmate bridge from compose image 2026-04-12 15:00:43 +08:00
334 changed files with 16793 additions and 1705 deletions

View File

@ -0,0 +1,44 @@
name: Validate Release PR
# release/* 分支的发布策略门禁:仅接受 hotfix/* 或带 cherry-pick/backport 标签的 PR。
# 详见 iac_modules/docs/tldr-github-branch-model.md
on:
pull_request_target:
types: [opened, synchronize, reopened, labeled, unlabeled]
permissions:
contents: read
pull-requests: read
jobs:
validate-release-source:
runs-on: ubuntu-latest
if: startsWith(github.base_ref, 'release/')
steps:
- name: Check PR source branch
run: |
SRC="${{ github.head_ref }}"
TGT="${{ github.base_ref }}"
LABELS="${{ join(github.event.pull_request.labels.*.name, ',') }}"
echo "🔍 Validating PR into release branch"
echo " source: $SRC"
echo " target: $TGT"
echo " labels: $LABELS"
if [[ "$SRC" =~ ^hotfix/ ]]; then
echo "✅ Allowed: hotfix/* branch"
exit 0
fi
if [[ "$LABELS" =~ (^|,)(cherry-pick|backport)(,|$) ]]; then
echo "✅ Allowed: cherry-pick/backport labeled PR"
exit 0
fi
echo "❌ Rejected."
echo "release/* 仅接受:"
echo " - 来自 hotfix/* 的 PR"
echo " - 带 cherry-pick 或 backport 标签的 PR已验证 feature 的 backport/cherry-pick"
echo "禁止从 main / develop / feature/* 直接合并到 release/*。"
exit 1

3
.gitignore vendored
View File

@ -1,5 +1,6 @@
xfce-secrets.yml
inventory/__pycache__/
.playwright-mcp/
.env
.artifacts/
.artifacts/acp_codex/xworkmate-go-core

3
.gitleaksignore Normal file
View File

@ -0,0 +1,3 @@
dcdc9bea7b49f045e1ac0a30f85a5e0c84c1e8db:group_vars/xworkmate_bridge_distributed.yml:generic-api-key:41
ba4daa35977d3c7aaecc1f9dd42a6dc41794d04c:group_vars/xworkmate_bridge_distributed.yml:generic-api-key:35
126a19e2828f52b2a510e107ef66a9ef1d1e88cf:docs/tldr-ssh-security.md:hashicorp-tf-password:78

View File

@ -1,5 +1,20 @@
# playbooks
## XWorkmate Bridge Distributed VPN
The bidirectional WireGuard-over-VLESS transport for the two XWorkmate bridge
nodes is deployed by:
```bash
ansible-playbook -i inventory.ini vpn-wireguard-over-vless.yml
```
The implementation uses split bridge groups (`xworkmate_bridge` and
`cn_xworkmate_bridge`) under `xworkmate_bridge_distributed`, stores private keys
and the shared management-side Xray UUID in `https://vault.svc.plus`, and keeps
the host's default `xray.service` untouched. The runbook lives in
[`roles/vhosts/xworkmate_bridge_distributed_vpn/README.md`](/Users/shenlan/workspaces/cloud-neutral-toolkit/playbooks/roles/vhosts/xworkmate_bridge_distributed_vpn/README.md).
## Cloud Dev Desktop
The cloud dev desktop flow lives here as two playbooks:

View File

@ -1,4 +1,5 @@
[defaults]
allow_world_readable_tmpfiles = True
# 常用参数
# 默认清单文件路径,可按需改
inventory = ./inventory.ini
@ -7,6 +8,9 @@ forks = 10
poll_interval = 10
transport = smart
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 3600
# 输出配置:使用 ansible-core 内置 callback避免在轻量 CI 环境里缺少额外插件
stdout_callback = default
@ -24,3 +28,6 @@ deprecation_warnings = False
cache = True
cache_plugin = jsonfile
cache_timeout = 3600
[ssh_connection]
pipelining = True

28
api.plist.j2 Normal file
View File

@ -0,0 +1,28 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>plus.svc.xworkspace.api</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>-c</string>
<string>
source "{{ xworkspace_console_config_dir }}/portal.env"
export PATH="/opt/homebrew/bin:/usr/local/bin:$PATH"
exec {{ xworkspace_console_api_exec }}
</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>WorkingDirectory</key>
<string>{{ xworkspace_console_api_working_dir }}</string>
<key>StandardOutPath</key>
<string>{{ ansible_env.HOME }}/.local/state/xworkspace/api.log</string>
<key>StandardErrorPath</key>
<string>{{ ansible_env.HOME }}/.local/state/xworkspace/api.err.log</string>
</dict>
</plist>

29
console.plist.j2 Normal file
View File

@ -0,0 +1,29 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>plus.svc.xworkspace.console</string>
<key>ProgramArguments</key>
<array>
<string>/bin/bash</string>
<string>-c</string>
<string>
export PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:$PATH"
# 预编译 runtime 只发 dashboard/dist无 package.json且 dashboard 是
# 无客户端路由的单页应用,故用 python3 静态伺服 dist 即可macOS 无 caddy
exec /usr/bin/env python3 -m http.server {{ xworkspace_console_port }} --bind 127.0.0.1 --directory "{{ xworkspace_console_dashboard_dir }}/dist"
</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>WorkingDirectory</key>
<string>{{ xworkspace_console_dashboard_dir }}/dist</string>
<key>StandardOutPath</key>
<string>{{ ansible_env.HOME }}/.local/state/xworkspace/console.log</string>
<key>StandardErrorPath</key>
<string>{{ ansible_env.HOME }}/.local/state/xworkspace/console.err.log</string>
</dict>
</plist>

11
deploy_QMD.yml Normal file
View File

@ -0,0 +1,11 @@
---
- name: Deploy QMD extended memory
hosts: "{{ qmd_hosts | default('all') }}"
become: true
gather_facts: true
module_defaults:
ansible.builtin.apt:
lock_timeout: "{{ ai_workspace_apt_lock_timeout | default(900) | int }}"
roles:
- role: roles/vhosts/qmd/
tags: [qmd]

View File

@ -4,9 +4,11 @@
become: true
gather_facts: true
roles:
- role: roles/vhosts/deploy_acp_vhosts/
- role: roles/vhosts/acp_server_codex/
tags: [acp_codex]
- role: roles/vhosts/xworkmate_bridge/
vars:
deploy_acp_codex: true
deploy_acp_opencode: false
deploy_acp_gemini: false
tags: [deploy_acp_vhosts, acp_codex]
tags: [xworkmate_bridge, acp_codex]

View File

@ -4,9 +4,11 @@
become: true
gather_facts: true
roles:
- role: roles/vhosts/deploy_acp_vhosts/
- role: roles/vhosts/acp_server_gemini/
tags: [acp_gemini]
- role: roles/vhosts/xworkmate_bridge/
vars:
deploy_acp_codex: false
deploy_acp_opencode: false
deploy_acp_gemini: true
tags: [deploy_acp_vhosts, acp_gemini]
tags: [xworkmate_bridge, acp_gemini]

View File

@ -4,9 +4,11 @@
become: true
gather_facts: true
roles:
- role: roles/vhosts/deploy_acp_vhosts/
- role: roles/vhosts/acp_server_opencode/
tags: [acp_opencode]
- role: roles/vhosts/xworkmate_bridge/
vars:
deploy_acp_codex: false
deploy_acp_opencode: true
deploy_acp_gemini: false
tags: [deploy_acp_vhosts, acp_opencode]
tags: [xworkmate_bridge, acp_opencode]

9
deploy_agent_hermes.yml Normal file
View File

@ -0,0 +1,9 @@
---
- name: Deploy Hermes ACP agent adapter
hosts: "{{ acp_hermes_hosts | default('all') }}"
become: true
gather_facts: true
roles:
- role: roles/vhosts/acp_server_hermes/
tags: [acp_hermes, hermes]

View File

@ -3,9 +3,12 @@
gather_facts: true
become: true
vars:
billing_service_source_dir: >-
{{ lookup('ansible.builtin.env', 'BILLING_SERVICE_SOURCE_DIR')
| default(playbook_dir ~ '/../billing-service', true) }}
billing_service_binary_artifact: >-
{{ lookup('ansible.builtin.env', 'BILLING_SERVICE_BINARY_ARTIFACT')
| default('', true) }}
billing_service_image_ref: >-
{{ lookup('ansible.builtin.env', 'BILLING_SERVICE_IMAGE_REF')
| default('', true) }}
billing_service_exporter_base_url: >-
{{ lookup('ansible.builtin.env', 'EXPORTER_BASE_URL')
| default('http://127.0.0.1:8080', true) }}
@ -39,6 +42,35 @@
{{ lookup('ansible.builtin.env', 'INITIAL_BALANCE')
| default('0', true) }}
pre_tasks:
- name: Validate BILLING_SERVICE_BINARY_ARTIFACT is present
ansible.builtin.assert:
that:
- billing_service_binary_artifact | length > 0
fail_msg: "BILLING_SERVICE_BINARY_ARTIFACT must be exported before running this playbook."
success_msg: "BILLING_SERVICE_BINARY_ARTIFACT found"
- name: Validate BILLING_SERVICE_BINARY_ARTIFACT exists
ansible.builtin.stat:
path: "{{ billing_service_binary_artifact }}"
register: billing_service_binary_artifact_stat
delegate_to: localhost
become: false
run_once: true
- name: Assert BILLING_SERVICE_BINARY_ARTIFACT exists on controller
ansible.builtin.assert:
that:
- billing_service_binary_artifact_stat.stat.exists
- billing_service_binary_artifact_stat.stat.isreg
fail_msg: "BILLING_SERVICE_BINARY_ARTIFACT must point to an existing binary artifact."
success_msg: "BILLING_SERVICE_BINARY_ARTIFACT exists"
delegate_to: localhost
become: false
run_once: true
- name: Validate BILLING_SERVICE_IMAGE_REF is present
ansible.builtin.assert:
that:
- billing_service_image_ref | length > 0
fail_msg: "BILLING_SERVICE_IMAGE_REF must be exported before running this playbook."
success_msg: "BILLING_SERVICE_IMAGE_REF found"
- name: Validate DATABASE_URL is present
ansible.builtin.assert:
that:

View File

@ -4,6 +4,7 @@
become: true
roles:
- roles/vhosts/docker
- roles/vhosts/caddy
- roles/vhosts/console_service
- name: Sync console DNS records when requested

58
deploy_docs_svc_plus.yml Normal file
View File

@ -0,0 +1,58 @@
- name: Deploy managed docs.svc.plus service
hosts: "{{ docs_service_target_host | default(docs_service_hosts | default('docs')) }}"
gather_facts: true
become: true
vars:
docs_service_image_ref: >-
{{
(lookup('ansible.builtin.env', 'DOCS_IMAGE_REF') | default('', true) | trim)
or
(
(lookup('ansible.builtin.env', 'DOCS_IMAGE_REPO') | default('ghcr.io/x-evor/docs', true))
~ ':'
~ (lookup('ansible.builtin.env', 'DOCS_IMAGE_TAG') | default('latest', true))
)
}}
docs_service_image_repo: >-
{{ lookup('ansible.builtin.env', 'DOCS_IMAGE_REPO')
| default('ghcr.io/x-evor/docs', true) }}
docs_service_image_tag: >-
{{ lookup('ansible.builtin.env', 'DOCS_IMAGE_TAG')
| default('latest', true) }}
docs_service_pull_image: >-
{{ lookup('ansible.builtin.env', 'DOCS_PULL_IMAGE')
| default(true, true) | bool }}
docs_service_knowledge_repo_path_host: >-
{{ lookup('ansible.builtin.env', 'DOCS_KNOWLEDGE_REPO_PATH_HOST')
| default('', true) }}
docs_service_internal_service_token: >-
{{
lookup('ansible.builtin.env', 'DOCS_INTERNAL_SERVICE_TOKEN')
| default(lookup('ansible.builtin.env', 'INTERNAL_SERVICE_TOKEN') | default('', true), true)
}}
docs_service_reload_interval: >-
{{ lookup('ansible.builtin.env', 'DOCS_RELOAD_INTERVAL')
| default('5m', true) }}
docs_service_container_port: >-
{{ lookup('ansible.builtin.env', 'DOCS_SERVICE_PORT')
| default('8084', true) }}
docs_service_host_port: >-
{{ lookup('ansible.builtin.env', 'DOCS_HOST_PORT')
| default('18086', true) }}
roles:
- roles/vhosts/docker
- roles/vhosts/caddy
- roles/vhosts/docs_service
- name: Sync docs DNS records when requested
hosts: localhost
connection: local
gather_facts: false
tasks:
- name: Reconcile Cloudflare DNS for docs target host
when: docs_service_sync_dns | default(false)
ansible.builtin.include_role:
name: cloudflare_svc_plus_dns
vars:
cloudflare_dns_source_hosts:
- "{{ docs_service_target_host | default(docs_service_hosts | default('docs')) }}"

View File

@ -0,0 +1,11 @@
---
- name: Deploy OpenClaw gateway vhost
hosts: "{{ gateway_openclaw_hosts | default('all') }}"
become: true
gather_facts: true
module_defaults:
ansible.builtin.apt:
lock_timeout: "{{ ai_workspace_apt_lock_timeout | default(900) | int }}"
roles:
- role: roles/vhosts/gateway_openclaw/
tags: [gateway_openclaw, openclaw]

View File

@ -0,0 +1,10 @@
---
- name: Deploy Modern IT History Docusaurus ebook
hosts: "{{ modern_it_history_target_host | default('jp_xhttp_contabo_host') }}"
gather_facts: true
become: true
vars:
nodejs_version: "24.x"
roles:
- roles/vhosts/nodejs
- roles/vhosts/modern_it_history

View File

@ -1,8 +0,0 @@
- name: Deploy PostgreSQL on vhosts
hosts: "{{ postgresql_target | default('postgresql') }}"
become: true
vars:
group: "{{ group | default(postgresql_target | default('postgresql')) }}"
roles:
- roles/vhosts/common/
- roles/vhosts/postgres/

View File

@ -3,6 +3,17 @@
hosts: "{{ xworkmate_bridge_hosts | default('all') }}"
become: true
gather_facts: true
module_defaults:
ansible.builtin.apt:
lock_timeout: "{{ ai_workspace_apt_lock_timeout | default(900) | int }}"
roles:
- role: roles/vhosts/deploy_acp_vhosts/
tags: [deploy_acp_vhosts]
- role: roles/vhosts/acp_server_codex/
tags: [acp_codex]
- role: roles/vhosts/acp_server_opencode/
tags: [acp_opencode]
- role: roles/vhosts/acp_server_gemini/
tags: [acp_gemini]
- role: roles/vhosts/acp_server_hermes/
tags: [acp_hermes]
- role: roles/vhosts/xworkmate_bridge/
tags: [xworkmate_bridge]

View File

@ -0,0 +1,170 @@
# AI Workspace Runtime 交付计划
## 1. 目标与边界
本计划定义 AI Workspace 核心运行时从源码仓库构建、发布、离线聚合到目标机部署的完整交付链路。
核心原则:
- LiteLLM、xworkspace-console、xworkmate-bridge、QMD 分别在各自源码仓库的 GitHub Actions build job 中构建。
- 每个组件独立发布 `runtime-*` GitHub Release 及其 SHA256 清单。
- offline package 只下载已发布产物,逐文件完成 SHA256 校验后再聚合。
- 目标机只允许校验、解包、安装、配置、启动和健康检查,禁止源码编译、依赖构建及镜像构建。
- 所有未经过 CI 或目标机矩阵实测的能力均保持 `TODO`,不得仅依据设计或局部实现标记完成。
## 2. 目标架构
```text
LiteLLM repository ---------- build job --> runtime-litellm-* ----------\
xworkspace-console repository build job --> runtime-xworkspace-console-* --\
xworkmate-bridge repository -- build job --> runtime-xworkmate-bridge-* -----+--> offline package job
QMD repository -------------- build job --> runtime-qmd-* ------------------/ |
| download
| SHA256 verify
| manifest aggregate
v
offline-package-*
|
v
target host: verify/install only
```
### 2.1 组件 Release
四个组件必须由各自仓库负责构建,聚合仓库不得从源码代建组件。
| 组件 | 构建责任 | Release 命名 | 必需产物 |
| --- | --- | --- | --- |
| LiteLLM | LiteLLM 仓库 GitHub Actions build job | `runtime-litellm-*` | 固定版本 Python runtime/依赖包、启动入口、组件 manifest、SHA256 清单 |
| xworkspace-console | xworkspace-console 仓库 GitHub Actions build job | `runtime-xworkspace-console-*` | dashboard 静态产物、API 二进制、运行配置模板、组件 manifest、SHA256 清单 |
| xworkmate-bridge | xworkmate-bridge 仓库 GitHub Actions build job | `runtime-xworkmate-bridge-*` | bridge 二进制、systemd/运行配置模板、组件 manifest、SHA256 清单 |
| QMD | QMD 仓库 GitHub Actions build job | `runtime-qmd-*` | 已安装依赖和已构建 CLI/runtime、组件 manifest、SHA256 清单 |
资产文件名必须精确匹配,聚合器和目标机均不得尝试别名、模糊匹配或兼容猜测:
- Console`xworkspace-console-runtime-linux-{amd64|arm64}.tar.gz`
- Bridge`xworkmate-bridge-linux-{amd64|arm64}.tar.gz`
- QMD`qmd-runtime-linux-{amd64|arm64}.tar.gz`
- LiteLLM`litellm-runtime-{distro}-{version}-{arch}.tar.gz`
每个组件 manifest 至少记录:组件名、源码 commit、版本、构建时间、目标 OS、目标架构、入口文件、文件列表及每个文件的 SHA256。
### 2.2 offline package 聚合
offline package job 必须:
1. 从四个组件的 `runtime-*` Release 下载与目标平台匹配的产物和 SHA256 清单。
2. 在聚合前执行 SHA256 校验;缺少清单、文件缺失或摘要不一致时立即失败。
3. 生成聚合 manifest固定四个组件的 Release tag、源码 commit、资产 URL、资产大小和 SHA256。
4. 将已校验组件产物、部署 playbook 所需依赖及聚合 manifest 打包为 `offline-package-*`
5. 对最终 offline package 再生成 SHA256并在 CI 中执行一次解包与结构校验。
禁止以 `latest` 作为不可追溯的部署输入;重新聚合必须基于明确 tag 或不可变 commit。
### 2.3 目标机部署
目标机部署必须开启 prebuilt-only 约束。缺少任一预构建产物时直接失败,不得回退到以下行为:
- `git clone` 或源码 checkout
- `npm install`、`npm run build`、`go build`、`go run`
- `pip install` 从公网或源码解析构建依赖;
- `docker build`、`podman build` 或其他本地镜像构建;
- 任何需要编译器、SDK 或前端构建工具链的安装步骤。
部署仅执行offline package SHA256 校验、manifest 校验、解包、文件安装、权限设置、配置渲染、服务启动、健康检查和结果汇总。
## 3. 资源与性能约束
### 3.1 并发控制
- 全局并发硬上限必须满足 `并发数 <= 2 * 在线 CPU 数`,在线 CPU 数以执行时实际可用 CPU 为准。
- 初始并发取任务上限、配置上限和 `2 * 在线 CPU 数` 三者最小值。
- 调度器必须随 load 动态收缩:负载超过阈值时停止发放新任务并逐级降低并发;负载恢复且持续稳定后再缓慢扩容。
- 动态收缩不得中断正在执行的不可重入安装步骤;只限制后续任务进入。
- 日志和最终摘要必须记录 CPU 数、load 采样、每次并发调整的时间、原因及调整前后值。
### 3.2 部署耗时分布
每次部署必须记录总耗时及至少以下阶段耗时:
- offline package 下载;
- SHA256 与 manifest 校验;
- 解包;
- 各组件安装;
- 配置渲染;
- 服务启动;
- 健康检查。
CI/验收报告按 OS、架构、冷启动/缓存命中、首次执行/幂等重跑分组,统计样本数、最小值、最大值、平均值以及 P50、P90、P95、P99。样本不足时保留原始数据并明确标注不以单次耗时代替分布结论。
## 4. 支持矩阵与验收
目标支持以下全部组合:
| 发行版 | 版本 | 架构 |
| --- | --- | --- |
| Debian | 11、12、13 | amd64、arm64 |
| Ubuntu | 22.04、24.04、26.04 | amd64、arm64 |
每个矩阵项必须验证:
1. offline package 下载和 SHA256 校验成功。
2. 目标机在无源码、无构建工具链、组件外网访问受限的条件下部署成功。
3. 四个组件版本与聚合 manifest 完全一致。
4. 服务启动、健康检查和关键 smoke test 成功。
5. 同一主机使用同一输入至少连续执行两次;第二次成功且无非预期变更、无重复资源、无凭据轮换、无构建行为。
6. 首次部署和幂等重跑均产出阶段耗时及完整摘要。
Ubuntu 26.04 在实际可用 runner/镜像和依赖生态完成验证前,只能保持计划支持状态,不得标记已验证。
## 5. 当前事实
以下状态只记录当前仓库或相邻交付文档能够证明的事实,不把目标设计视为完成:
- [x] 聚合入口已拆分为 preflight 与 runtime playbookpreflight 已校验 `docker`、`k3s`、`systemd` 运行模式组合。
- [x] xworkspace-console 与 QMD 的部署代码已出现预构建 archive 输入及 prebuilt-only 缺包失败入口。
- [x] 相邻一键部署文档已记录xworkspace-console 离线包 `publish-release` 链路和 Release 产物上传曾核对完成。
- [x] 相邻一键部署文档已记录:一键安装脚本优先使用离线安装包。
- [ ] xworkspace-console 与 QMD 当前仍存在目标机源码 checkout/依赖安装/构建回退,尚未满足“目标机禁止构建”。
- [ ] LiteLLM 当前可覆盖 package spec但未证明其独立 `runtime-litellm-*` Release 和完全离线、免构建安装链路。
- [ ] xworkmate-bridge 独立 `runtime-xworkmate-bridge-*` Release 和预构建消费链路尚未在本计划范围内验证。
- [ ] 四组件 Release 的一致命名、manifest 和 SHA256 契约尚未完成验证。
- [ ] offline package 的逐文件下载、SHA256 校验、聚合 manifest 和最终包校验尚未完成验证。
- [ ] 并发硬上限、基于 load 的动态收缩及调整日志尚未完成验证。
- [ ] 部署耗时分布统计尚未完成验证。
- [ ] 连续重复执行的幂等性验收尚未完成。
- [ ] Debian 11/12/13、Ubuntu 22.04/24.04/26.04 的 amd64/arm64 全矩阵尚未完成验证。
## 6. TODO
### P0构建与发布闭环
- [ ] TODO在 LiteLLM 仓库建立 build job发布 `runtime-litellm-*` Release、组件 manifest 和 SHA256 清单。
- [ ] TODO在 xworkspace-console 仓库固化 build job确认每次发布 `runtime-xworkspace-console-*` Release、组件 manifest 和 SHA256 清单。
- [ ] TODO在 xworkmate-bridge 仓库建立 build job发布 `runtime-xworkmate-bridge-*` Release、组件 manifest 和 SHA256 清单。
- [ ] TODO在 QMD 仓库建立 build job发布 `runtime-qmd-*` Release、组件 manifest 和 SHA256 清单。
- [ ] TODO为 amd64、arm64 分别产出可安装资产;若资产与发行版相关,则按支持矩阵拆分并在 manifest 中明确兼容范围。
- [ ] TODO增加 Release 契约测试拒绝缺失入口、manifest、SHA256 或架构资产的发布。
### P0离线聚合与目标机免构建
- [ ] TODO实现 offline package job按固定 tag 下载四组件 Release并在聚合前逐文件执行 SHA256 校验。
- [ ] TODO生成可追溯聚合 manifest并为最终 `offline-package-*` 生成和发布 SHA256。
- [ ] TODO在目标机部署入口强制 prebuilt-only删除或禁用四组件所有源码构建回退。
- [ ] TODO增加“目标机禁止构建”守卫检测到编译器调用、包构建命令、源码 checkout 或镜像构建即失败。
- [ ] TODO在断网或仅允许访问 offline package 源的目标机上完成端到端部署验证。
### P1并发、性能与可观测性
- [ ] TODO实现在线 CPU 探测和 `<= 2 * 在线 CPU` 的全局并发硬限制。
- [ ] TODO定义 load 采样窗口、收缩/恢复阈值、迟滞策略和最低并发,完成动态收缩测试。
- [ ] TODO记录阶段级耗时、组件级耗时、并发变化和环境标签产出结构化 JSON 及人类可读摘要。
- [ ] TODO汇总部署耗时分布至少输出 count/min/max/avg/P50/P90/P95/P99并区分首次执行与幂等重跑。
### P1幂等与平台矩阵
- [ ] TODO为每个支持矩阵项连续执行至少两次验证第二次无非预期 changed、服务中断、重复资源或凭据变化。
- [ ] TODO覆盖 Debian 11/12/13 amd64/arm64。
- [ ] TODO覆盖 Ubuntu 22.04/24.04/26.04 amd64/arm64。
- [ ] TODO保存每个矩阵项的 Release tag、offline package SHA256、部署日志、耗时数据和验收结论。
- [ ] TODO全部矩阵通过后再把“计划支持”更新为“已验证支持”部分通过时逐项记录不做整体完成声明。

View File

@ -0,0 +1,576 @@
# LiteLLM Gateway 部署指南
## 目标架构
```
┌─────────────────────────────────────────┐
│ Caddy (HTTPS Entry) │
│ │
│ ┌──────────────────────────────────┐ │
Internet ──────────►│ │ api.svc.plus/v1/openai/* │ │
│ │ api.svc.plus/v1/anthropic/* │ │
│ │ api.svc.plus/ui/* │ │
│ └──────────────────────────────────┘ │
└──────────────────┬──────────────────────┘
┌──────────────────▼──────────────────────┐
│ LiteLLM Proxy (127.0.0.1:4000) │
│ │
│ ┌──────────────────────────────────┐ │
│ │ /v1/chat/completions (OpenAI) │ │
│ │ /v1/messages (Anthropic) │ │
│ │ /ui (Admin Dashboard) │ │
│ └──────────────────────────────────┘ │
└──────────────────┬──────────────────────┘
┌──────────────────▼──────────────────────┐
│ Model Providers (External) │
│ │
│ • OpenAI (GPT-4o-mini) │
│ • Anthropic (Claude 3.5 Sonnet) │
│ • DeepSeek (deepseek-chat) │
│ • Local Models (OAI-compatible) │
└─────────────────────────────────────────┘
```
## 推荐目录结构
```
/etc/litellm/
├── config.yaml # LiteLLM 配置文件
└── litellm.env # 环境变量 (包含 API Keys)
/etc/systemd/system/
└── litellm-proxy.service # systemd 服务单元
/etc/caddy/conf.d/
└── litellm.caddy # Caddy 路由配置
```
## 一、Caddyfile 配置示例
```caddy
# /etc/caddy/conf.d/litellm.caddy
# API Gateway + LiteLLM Admin UI (统一入口)
api.svc.plus {
# LiteLLM Admin UI (Basic Auth 保护)
@ui_admin {
path /ui/*
}
@ui_admin_unauthorized {
not header Authorization "Basic *"
}
handle @ui_admin_unauthorized {
respond "Unauthorized" 401 {
www-authenticate Basic realm="LiteLLM Admin UI"
}
}
handle @ui_admin {
reverse_proxy 127.0.0.1:4000
}
# OpenAI-Compatible API
@openai_api {
path /v1/openai/*
}
handle @openai_api {
rewrite * /v1{path}
reverse_proxy 127.0.0.1:4000 {
flush_interval -1
transport http {
dial_timeout 30s
read_timeout 600s
write_timeout 600s
}
}
}
# Anthropic-Compatible API
@anthropic_api {
path /v1/anthropic/*
}
handle @anthropic_api {
rewrite * /v1{path}
reverse_proxy 127.0.0.1:4000 {
flush_interval -1
transport http {
dial_timeout 30s
read_timeout 600s
write_timeout 600s
}
}
}
# 通用代理
handle {
reverse_proxy 127.0.0.1:4000
}
encode gzip zstd
header {
X-Real-IP
X-Forwarded-For
X-Forwarded-Proto
Host
}
log {
output file /var/log/caddy/litellm.access.log
}
}
```
### 关键路径映射
| 外部路径 | 内部路径 | 说明 |
|---------------------------------------|--------------------------|--------------|
| `https://api.svc.plus/v1/openai/chat/completions` | `http://127.0.0.1:4000/v1/chat/completions` | OpenAI 兼容 API |
| `https://api.svc.plus/v1/anthropic/messages` | `http://127.0.0.1:4000/v1/messages` | Anthropic 兼容 API |
| `https://api.svc.plus/ui/*` | `http://127.0.0.1:4000/ui/*` | Admin UI (Basic Auth) |
| `https://api.svc.plus/v1/chat/completions` | `http://127.0.0.1:4000/v1/chat/completions` | 短路径兼容 (可选) |
---
## 二、LiteLLM config.yaml 示例
```yaml
# /etc/litellm/config.yaml
model_list:
# OpenAI 模型
- model_name: gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY
# Anthropic 模型
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
api_key: os.environ/ANTHROPIC_API_KEY
# DeepSeek 模型
- model_name: deepseek-chat
litellm_params:
model: deepseek/deepseek-chat
api_key: os.environ/DEEPSEEK_API_KEY
# 本地 OpenAI-Compatible 模型
- model_name: local-qwen
litellm_params:
model: openai/qwen
api_base: http://127.0.0.1:8000/v1
api_key: os.environ/LOCAL_MODEL_API_KEY
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
drop_rate_limit_requests: true
set_verbose: false
router_settings:
model_group_alias:
gpt-4o-mini: gpt-4o-mini
claude-sonnet: claude-sonnet
deepseek-chat: deepseek-chat
routing_strategy: latency-based-routing
enable_pre_call_checks: false
retry_after: 60
num_retries: 3
litellm_settings:
drop_params: true
set_verbose: true
request_timeout: 600
telemetry: false
max_parallel_requests: 1000
environment_variables:
OPENAI_API_KEY: os.environ/OPENAI_API_KEY
ANTHROPIC_API_KEY: os.environ/ANTHROPIC_API_KEY
DEEPSEEK_API_KEY: os.environ/DEEPSEEK_API_KEY
LOCAL_MODEL_API_KEY: os.environ/LOCAL_MODEL_API_KEY
LITELLM_MASTER_KEY: os.environ/LITELLM_MASTER_KEY
```
---
## 三、litellm.env 示例
```bash
# /etc/litellm/litellm.env
# API Keys (从环境变量读取)
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
DEEPSEEK_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
LOCAL_MODEL_API_KEY=sk-local-placeholder
# LiteLLM Master Key (必须设置,用于 API 认证)
LITELLM_MASTER_KEY=your-secure-random-master-key-here-min-32-chars
# 可选配置
# LITELLM_SALT_KEY=your-salt-key
# DATABASE_URL=postgresql://user:pass@host:5432/litellm
```
**文件权限**: `chmod 600 /etc/litellm/litellm.env`
---
## 四、systemd 服务单元示例
```ini
# /etc/systemd/system/litellm-proxy.service
[Unit]
Description=LiteLLM Proxy Service
After=network.target
[Service]
Type=simple
User=ubuntu
Group=ubuntu
WorkingDirectory=/home/ubuntu
EnvironmentFile=/etc/litellm/litellm.env
ExecStart=/usr/local/bin/litellm \
--host 127.0.0.1 \
--port 4000 \
--config /etc/litellm/config.yaml
Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=litellm-proxy
[Install]
WantedBy=multi-user.target
```
---
## 五、部署步骤
### 1. 安装依赖
```bash
# 安装 Python 和 pip
apt update && apt install -y python3 python3-pip python3-venv
# 使用 pipx 安装 LiteLLM (推荐)
pip install pipx
pipx install litellm
# 或直接用 pip 安装
pip install litellm
```
### 2. 创建配置目录
```bash
mkdir -p /etc/litellm
chmod 755 /etc/litellm
```
### 3. 写入配置文件
```bash
# 写入 config.yaml
cat > /etc/litellm/config.yaml << 'EOF'
model_list:
- model_name: gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY
# ... 其他模型
EOF
# 写入环境变量文件
cat > /etc/litellm/litellm.env << 'EOF'
OPENAI_API_KEY=sk-xxx
ANTHROPIC_API_KEY=sk-ant-xxx
DEEPSEEK_API_KEY=sk-xxx
LITELLM_MASTER_KEY=your-secure-master-key
EOF
chmod 600 /etc/litellm/litellm.env
chmod 640 /etc/litellm/config.yaml
```
### 4. 部署 systemd 服务
```bash
cat > /etc/systemd/system/litellm-proxy.service << 'EOF'
[Unit]
Description=LiteLLM Proxy Service
After=network.target
[Service]
Type=simple
User=ubuntu
Group=ubuntu
WorkingDirectory=/home/ubuntu
EnvironmentFile=/etc/litellm/litellm.env
ExecStart=/usr/local/bin/litellm --host 127.0.0.1 --port 4000 --config /etc/litellm/config.yaml
Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=litellm-proxy
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable litellm-proxy
systemctl start litellm-proxy
systemctl status litellm-proxy
```
### 5. 配置 Caddy
```bash
# 确保 Caddy 导入 conf.d 目录
echo 'import /etc/caddy/conf.d/*.caddy' >> /etc/caddy/Caddyfile
# 创建 litellm Caddy 配置
cat > /etc/caddy/conf.d/litellm.caddy << 'EOF'
# ... 见上面的 Caddyfile 配置
EOF
# 验证并重载
caddy validate --config /etc/caddy/Caddyfile
systemctl reload caddy
```
### 6. 验证部署
```bash
# 检查 LiteLLM 健康状态
curl http://127.0.0.1:4000/health
# 检查 API Gateway
curl -X POST "https://api.svc.plus/v1/openai/chat/completions" \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"deepseek-chat","messages":[{"role":"user","content":"Hello"}]}'
# 访问 Admin UI
# https://api.svc.plus/ui/
```
---
## 六、API 验证命令
### 1. 健康检查
```bash
# 本地健康检查
curl http://127.0.0.1:4000/health
# 外部健康检查
curl https://api.svc.plus/health
```
### 2. OpenAI-Compatible API 测试
```bash
curl -X POST "https://api.svc.plus/v1/openai/chat/completions" \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-chat",
"messages": [
{
"role": "user",
"content": "Hello from OpenAI-compatible endpoint"
}
]
}'
```
### 3. Anthropic-Compatible API 测试
```bash
curl -X POST "https://api.svc.plus/v1/anthropic/messages" \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet",
"max_tokens": 256,
"messages": [
{
"role": "user",
"content": "Hello from Anthropic-compatible endpoint"
}
]
}'
```
### 4. Admin UI 访问
```bash
# 如果启用了 Basic Auth
# 访问 https://api.svc.plus/ui/
# 使用配置的 admin 用户名和密码登录
```
---
## 七、安全注意事项
### 1. 网络隔离
- **4000 端口只监听 127.0.0.1**,不暴露到公网
- VPS 防火墙**不要开放 4000 端口**
- 对外只开放 **443** (HTTPS)
- **Caddy 是唯一公网入口**
### 2. Admin UI 保护
LiteLLM Admin UI **不应裸奔**,建议启用以下至少一种保护:
| 保护方式 | 说明 |
|--------------|------------------------------|
| Basic Auth | Caddy 内置,配置用户名密码 |
| IP 白名单 | 只允许特定 IP 访问 api.svc.plus/ui |
| Cloudflare Access | Cloudflare Zero Trust 认证 |
| VPN / Tailscale | 通过私有网络访问 |
### 3. API 认证
- 所有 API 调用必须使用 `Authorization: Bearer <LITELLM_MASTER_KEY>`
- `LITELLM_MASTER_KEY` 必须足够长且随机 (建议 32+ 字符)
### 4. 文件权限
```bash
chmod 600 /etc/litellm/litellm.env # 保护 API Keys
chmod 640 /etc/litellm/config.yaml # 配置文件
```
---
## 八、Ansible 部署命令
```bash
# 部署 LiteLLM Gateway
ansible-playbook -i inventory.ini setup-litellm.yaml
# 指定 API Keys 部署
LITELLM_MASTER_KEY=your-secure-key \
OPENAI_API_KEY=sk-xxx \
ANTHROPIC_API_KEY=sk-ant-xxx \
DEEPSEEK_API_KEY=sk-xxx \
ansible-playbook -i inventory.ini setup-litellm.yaml
# 只部署 Caddy 配置 (不重启 LiteLLM)
ansible-playbook -i inventory.ini setup-litellm.yaml --tags litellm --start-at-task="Create LiteLLM Caddy fragment"
```
---
## 九、故障排查
### LiteLLM 服务无法启动
```bash
# 查看日志
journalctl -u litellm-proxy -f
# 验证配置
litellm --config /etc/litellm/config.yaml --test
```
### Caddy 配置无效
```bash
# 验证 Caddy 配置
caddy validate --config /etc/caddy/Caddyfile
# 查看 Caddy 日志
tail -f /var/log/caddy/litellm-*.log
```
### API 调用失败
```bash
# 检查端口绑定
ss -tlnp | grep 4000
# 测试本地连通性
curl http://127.0.0.1:4000/health
# 检查 API Key
source /etc/litellm/litellm.env
echo $LITELLM_MASTER_KEY
```
---
## 十、后续扩展
### 启用 PostgreSQL 数据库 (用于用量统计、团队管理等)
```bash
# 1. 安装 PostgreSQL
apt install -y postgresql postgresql-contrib
# 2. 创建数据库和用户
su - postgres
psql -c "CREATE USER litellm WITH PASSWORD 'your-password';"
psql -c "CREATE DATABASE litellm OWNER litellm;"
exit
# 3. 更新环境变量
echo "DATABASE_URL=postgresql://litellm:your-password@localhost:5432/litellm" >> /etc/litellm/litellm.env
# 4. 重启服务
systemctl restart litellm-proxy
```
### 集成 Vault (可选)
```bash
# 设置 Vault 环境变量
echo "VAULT_URL=https://vault.svc.plus" >> /etc/litellm/litellm.env
echo "VAULT_API_KEY_PATH=secret/litellm/api-keys" >> /etc/litellm/litellm.env
systemctl restart litellm-proxy
```
---
## 十一、Agent 接入配置
各 Agent 接入时只需配置 Base URL
| Agent 类型 | Base URL | 认证 |
|--------------|-----------------------------------|---------------|
| OpenAI SDK | `https://api.svc.plus/v1/openai` | `LITELLM_MASTER_KEY` |
| Anthropic SDK | `https://api.svc.plus/v1/anthropic` | `LITELLM_MASTER_KEY` |
| LiteLLM SDK | `https://api.svc.plus` | `LITELLM_MASTER_KEY` |
示例 (Python):
```python
from openai import OpenAI
client = OpenAI(
api_key="your-litellm-master-key",
base_url="https://api.svc.plus/v1/openai"
)
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Hello"}]
)
```

View File

@ -0,0 +1,128 @@
# AI Workspace 一键部署与全局安全网络配置向导
`setup-ai-workspace-all-in-one.yml` 是用于在目标 VPS 上完整、自动化地拉起 AI 研发环境底层组件与服务的聚合 Playbook。
> [!TIP]
> ## ⏳ TL;DR (太长不看版)
>
> **一键标准部署 (无需配置任何前置环境,自带随机密钥保护)**
> ```bash
> curl -sfL https://raw.githubusercontent.com/ai-workspace-lab/xworkspace-console/main/scripts/setup-ai-workspace-all-in-one.sh | bash -
> ```
>
> **一键极严防御部署 (瘫痪所有外网接口,强制全内网/VPN架构)**
> ```bash
> curl -sfL https://raw.githubusercontent.com/ai-workspace-lab/xworkspace-console/main/scripts/setup-ai-workspace-all-in-one.sh | AI_WORKSPACE_SECURITY_LEVEL=strict bash -
> ```
>
> **组合技:极严防御 + 单独开白名单口子 (如仅开放 LiteLLM 接口)**
> ```bash
> curl -sfL https://raw.githubusercontent.com/ai-workspace-lab/xworkspace-console/main/scripts/setup-ai-workspace-all-in-one.sh | AI_WORKSPACE_SECURITY_LEVEL=strict LITELLM_API_CADDY_STRICT_WHITELIST=true bash -
> ```
>
> **高级定制:一键部署全架构并按需开启可选功能 (如 XRDP并自定义认证 Token)**
> ```bash
> curl -sfL https://raw.githubusercontent.com/ai-workspace-lab/xworkspace-console/main/scripts/setup-ai-workspace-all-in-one.sh | \
> XWORKSPACE_CONSOLE_ENABLE_XRDP=true \
> XWORKSPACE_CONSOLE_PUBLIC_ACCESS=true \
> XWORKMATE_BRIDGE_PUBLIC_ACCESS=true \
> GATEWAY_OPENCLAW_PUBLIC_ACCESS=false \
> VAULT_PUBLIC_ACCESS=false \
> LITELLM_API_CADDY_STRICT_WHITELIST=true \
> DEPLOY_TOKEN="my-secure-custom-token-123" \
> bash -
> ```
本文档将详细介绍它的基础用法,并重点讲解如何通过内置的全局开关与细粒度 `public_access` 控制,打造出“最严安全网络架构”(断开一切外部 Web 端口代理,仅限加密 VPN 内网互联)。
## TODO
- [x] 等待并核对 `xworkspace-console` 的离线包 GitHub Actions 发布链路,确认 `publish-release` 完整结束且 release 产物上传成功。
- [ ] 继续核对 `root@acp-bridge.onwalk.net` 的远程部署进度,确认 `setup-ai-workspace-all-in-one.sh` 最终完成并输出统一摘要。
- [x] `setup-ai-workspace-all-in-one.sh` 在目标主机上优先使用离线安装包加速部署,减少在线拉取与安装耗时。
- [ ] 验证 `setup-ai-workspace-all-in-one.sh` 幂等性:同一主机连续执行两次均成功,复用凭据、离线包缓存与已导入镜像,并安全等待部署/APT 锁。
- [ ] 完成最终验收核对Bridge 对外可达、其余服务默认仅本地监听、`acp-codex` / `opencode` / `gemini` / `hermes` / `qmd` / `litellm` 状态正常。
- [ ] 记录最终提交哈希与远端验证结果,回填到本计划的交付结果部分。
---
## 1. 常规快速部署
如果您希望采用**标准Standard安全模式**部署(即:允许需要对外提供部分 Web/API 接口的应用如 `XWorkmate Bridge` 通过 HTTPS 暴露到公网,但内部组件互相隔离)。
```bash
ansible-playbook -i inventory.ini setup-ai-workspace-all-in-one.yml \
--limit jp-xhttp-contabo.svc.plus \
--vault-password-file ~/.vault_password
```
---
## 2. 极致安全:强制全隔离模式 (VPN Only)
如果您正在处理高敏感度的业务,或目标服务器被作为纯后台的 AI 基础设施节点。您可以选择将其配置为**最严的安全等级 (Strict)**。
在此模式下,任何默认开放外网的应用,都将被**强制剥夺公网入口(其 Caddy 代理配置或 K8s Ingress 将被直接销毁删除)。外部黑客或扫描器即便知道子域名,也无法解析请求到您的端口,此时访问服务器上的任何 AI 服务,全部必须经过内部加密隧道(例如 WireGuard / Tailscale 等 VPN 虚拟局域网)。**
**执行部署命令:**
```bash
ansible-playbook -i inventory.ini setup-ai-workspace-all-in-one.yml \
--limit jp-xhttp-contabo.svc.plus \
--vault-password-file ~/.vault_password \
-e "ai_workspace_security_level=strict"
```
---
## 3. 个性化服务放行与阻断 (-e 开关详解)
系统设计了精细化的权限参数,可以在 `standard` 安全模式的基础下,针对某个独立应用进行公网切断;又或者在 `strict` 极致安全模式的底座上,单独给某个应用“开一个白名单口子”。
### 全局策略控制开关
- `-e "ai_workspace_security_level=strict"`
* **作用:** 一键切断所有默认带有对外出口的组件。覆盖掉下述开关的默认策略,将其全部强转为 `false`
### 细粒度服务暴露开关 (支持针对性覆盖)
1. **XWorkspace Console (底层主工作区门户) 公网访问控制**
- **默认值:** `true` (standard 下) / `false` (strict 下)
- **参数:** `-e "xworkspace_console_public_access=false"`
- **作用:** 设为 true 时,会自动将本地 17000 端口通过 Caddy 反向代理到绑定的 `workspace.svc.plus` 域名提供公网访问。设为 false 时则销毁对应代理文件,只能进服务器内网/XRDP访问。
2. **XWorkmate Bridge 公网访问控制**
- **默认值:** `true` (standard 下) / `false` (strict 下)
- **参数:** `-e "xworkmate_bridge_public_access=false"`
- **作用:** 设为 false 时,会彻底删除该服务在 Caddy `/etc/caddy/conf.d` 中的 `.caddy` 文件,使其失去从外界 HTTPS 进入内部 8787 端口的路径。
3. **OpenClaw Gateway 公网访问控制**
- **默认值:** `false` (无论在何种策略下,底层模型网关默认不允许直接向公网打开界面入口)
- **参数:** `-e "gateway_openclaw_public_access=true"`
- **作用:** 当您在出差时,身边没有 VPN 环境,但迫切需要连接远程 OpenClaw 平台时,可以通过将其设为 true 临时生成 Caddy 文件,恢复它的公网域名入口访问。
4. **Vault KMS 密钥中心公网访问控制**
- **默认值:** `false`
- **参数:** `-e "vault_public_access=true"`
- **作用:** 设为 false 时,该服务在 K8s 中部署的 Helm `ingress.enabled` 配置会被强制渲染为 false不会向集群外网注册路由。设为 true 时方可绑定公网 Ingress Class 域名。
5. **LiteLLM 轻量网关访问行为控制**
- **默认值:** `false`
- **参数:** `-e "litellm_api_caddy_strict_whitelist=true"`
- **作用:** 这个参数用于对 Caddy 代理行为做进一步保护开启后Caddy 会拦截一切没有命中官方兼容模型路径(如 `/v1/chat/completions`)的请求并拦截响应为 `404`,例如阻断前端 Dashboard UI`/ui*`)的外网暴露。
6. **按需开启 XRDP 远程桌面连接**
- **默认值:** `false`
- **参数:** `-e "xworkspace_console_enable_xrdp=true"`
- **作用:** XFCE 桌面环境默认仅提供基于 Web 浏览器的 Console UI如需通过原生 RDP 客户端(如 Windows 远程桌面)连接目标主机,可增加此参数。
## 典型组合使用场景
**场景:开启 Strict 全局断网防护,但唯独开放 LiteLLM 模型 API 入口供第三方业务端点调用,且通过最严格白名单防护。**
```bash
ansible-playbook -i inventory.ini setup-ai-workspace-all-in-one.yml \
--limit jp-xhttp-contabo.svc.plus \
--vault-password-file ~/.vault_password \
-e "ai_workspace_security_level=strict" \
-e "litellm_api_caddy_strict_whitelist=true"
```
这种精细的声明式管理,能确保基础设施按照 Infrastructure as Code (IaC) 的最佳安全实践被可预测地配置。

108
docs/tldr-ssh-security.md Normal file
View File

@ -0,0 +1,108 @@
# TLDR: SSH Security & Hardening Playbook
Quick reference for SSH security hardening, firewall controls, Fail2ban management, and connection checking.
## 1. SSH Hardening (Key-Only Auth)
Password login is completely disabled for all users. Direct root login is restricted to key-only.
### Configuration file
Drop-in config is deployed to:
`/etc/ssh/sshd_config.d/00-disable-password.conf`
```text
PasswordAuthentication no
PubkeyAuthentication yes
KbdInteractiveAuthentication no
PermitRootLogin prohibit-password
```
### Apply Changes
If you update SSH configurations, reload sshd:
```bash
# Debian/Ubuntu
sudo systemctl reload ssh
# RedHat/CentOS
sudo systemctl reload sshd
```
---
## 2. Fail2ban Management
Fail2ban monitors SSH authentication failures and bans offensive IPs.
### Default Settings
* **Bantime**: 24 hours (`86400` seconds)
* **Findtime**: 10 minutes (`600` seconds)
* **Maxretry**: 3 attempts
### Useful Commands
```bash
# Check Fail2ban service status
sudo systemctl status fail2ban
# Check sshd jail status (banned IPs)
sudo fail2ban-client status sshd
# Unban a specific IP
sudo fail2ban-client set sshd unbanip <IP>
# Manually ban a specific IP
sudo fail2ban-client set sshd banip <IP>
# View fail2ban logs
sudo tail -f /var/log/fail2ban.log
```
---
## 3. SSH Proxy Connection Helper (`ssh_check.exp`)
A generic `expect` helper script to verify ProxyJump-ed SSH connectivity.
### Usage
To prevent password leaks in shell history (`~/.bash_history` or `~/.zsh_history`), **never** pass the password as a command-line argument. Instead, use one of the secure methods below:
#### Option A: Read securely from input (Recommended)
```bash
# Type your password securely (input will not echo on screen)
read -s SSH_CHECK_PASSWORD
export SSH_CHECK_PASSWORD
# Run the helper script (picks up password from env var)
ssh_check.exp admin@tky-proxy.svc.plus root@167.179.110.129
```
#### Option B: Set via env var with leading space
If your shell is configured to ignore commands starting with a space (e.g. `HISTCONTROL=ignorespace` in bash or `setopt HIST_IGNORE_SPACE` in zsh), you can set the variable with a leading space:
```bash
export SSH_CHECK_PASSWORD="your_password"
ssh_check.exp admin@tky-proxy.svc.plus root@167.179.110.129
```
#### Option C: Legacy/Direct (Not recommended, leaves history trace)
```bash
ssh_check.exp admin@tky-proxy.svc.plus root@167.179.110.129 "your_password"
```
---
## 4. Firewall (UFW) quick-ref
Used on hosts to manage ports (e.g. 80, 443, 1443).
```bash
# View firewall rules with line numbers
sudo ufw status numbered
# Allow a port to Anywhere
sudo ufw allow 443/tcp
# Delete a rule by rule number
sudo ufw delete <rule_number>
# Restrict port 22 to a specific IP (e.g. Proxy IP)
sudo ufw allow from 43.207.194.92 to any port 22 proto tcp
sudo ufw delete allow 22/tcp
# Reload firewall
sudo ufw reload
```

View File

@ -0,0 +1,205 @@
# yitu-it-series R2 assets
This runbook migrates the local Google Drive `自媒体` directory to Cloudflare R2 for the Docusaurus AI Native knowledge base.
## Architecture
```text
GitHub -> Docusaurus -> Cloudflare Pages -> ebook.svc.plus
Google Drive local folder
-> rclone
-> Cloudflare R2 bucket: yitu-it-series
-> R2 custom domain: img.svc.plus
-> Docusaurus Markdown image URLs
```
## Source and target
```text
Local source:
/Users/shenlan/Library/CloudStorage/GoogleDrive-haitaopanhq@gmail.com/我的云端硬盘/自媒体
R2 bucket:
yitu-it-series
Public asset domain:
https://img.svc.plus
```
## Recommended object layout
```text
yitu-it-series/
├── covers/
├── xiaohongshu/
├── observability/
├── storage/
├── networking/
├── ai-native/
├── security/
├── platform-engineering/
└── ebook-assets/
```
Use stable, semantic paths for published content:
```text
covers/season-1/single-machine-to-platform-cover-v1.png
security/least-privilege/root-to-rootless-v1.png
ai-native/agentic-infra/ai-native-platform-v1.png
ebook-assets/diagrams/cloud-native-to-ai-native-v1.png
```
Prefer versioned object names instead of overwriting an already published image. This keeps Cloudflare CDN behavior predictable and preserves old articles.
## Cloudflare API token
Create two token scopes if possible:
```text
Bootstrap token:
- Account: Cloudflare R2: Edit
- Zone: DNS: Edit, Zone: Read for svc.plus
- Used only for bucket/custom-domain setup
Long-running R2 S3 token:
- R2 Object Read & Write
- Scope limited to bucket yitu-it-series
- Used by rclone sync
```
Required environment variables:
```bash
export CF_ACCOUNT_ID="..."
export CF_ZONE_ID="..."
export CLOUDFLARE_API_TOKEN="..."
export R2_ACCESS_KEY_ID="..."
export R2_SECRET_ACCESS_KEY="..."
```
## Commands
From the playbooks directory:
```bash
cd /Users/shenlan/workspaces/cloud-neutral-toolkit/playbooks
chmod +x scripts/sync-yitu-it-series-r2.sh
scripts/sync-yitu-it-series-r2.sh doctor
scripts/sync-yitu-it-series-r2.sh create-bucket
scripts/sync-yitu-it-series-r2.sh configure-rclone
scripts/sync-yitu-it-series-r2.sh dry-run
scripts/sync-yitu-it-series-r2.sh copy
scripts/sync-yitu-it-series-r2.sh check
scripts/sync-yitu-it-series-r2.sh tree
scripts/sync-yitu-it-series-r2.sh configure-custom-domain
```
Use `copy` for the first production migration when preserving all historical remote files matters. Use `sync` for steady-state mirroring after the source layout is stable.
## Performance profile
Default large AI image profile:
```bash
export RCLONE_TRANSFERS=16
export RCLONE_CHECKERS=32
export RCLONE_S3_UPLOAD_CUTOFF=128M
export RCLONE_S3_CHUNK_SIZE=128M
```
Many small images:
```bash
export RCLONE_TRANSFERS=32
export RCLONE_CHECKERS=64
```
Large source files such as PSD/video:
```bash
export RCLONE_TRANSFERS=4
export RCLONE_CHECKERS=16
export RCLONE_S3_UPLOAD_CUTOFF=256M
export RCLONE_S3_CHUNK_SIZE=256M
```
## Incremental sync
Install a macOS launchd sync job:
```bash
cd /Users/shenlan/workspaces/cloud-neutral-toolkit/playbooks
scripts/sync-yitu-it-series-r2.sh install-launchd
launchctl list | grep yitu-it-series
```
Remove it:
```bash
scripts/sync-yitu-it-series-r2.sh uninstall-launchd
```
## R2 custom domain
Target:
```text
img.svc.plus -> R2 bucket yitu-it-series
```
The script calls the Cloudflare R2 custom domain API:
```bash
scripts/sync-yitu-it-series-r2.sh configure-custom-domain
```
Recommended Cloudflare cache rule:
```text
If hostname equals img.svc.plus:
- Cache eligible
- Edge TTL: 30 days or longer
- Browser TTL: 7-30 days, or respect origin
```
## Docusaurus references
Markdown:
```md
![AI Native 基础设施演进](https://img.svc.plus/ai-native/ai-native-infra-cover-v1.png)
![最小权限演进](https://img.svc.plus/security/least-privilege-cover-v1.png)
```
MDX:
```mdx
<img
src="https://img.svc.plus/platform-engineering/platform-engineering-roadmap-v1.png"
alt="Platform Engineering Roadmap"
loading="lazy"
/>
```
Front matter:
```md
---
title: AI Native 基础设施演进
description: 从云原生到 AI Native 的平台工程知识库
image: https://img.svc.plus/covers/ai-native-infra-cover-v1.png
---
```
## AI Native knowledge-base practices
- Keep Docusaurus focused on Markdown, MDX, navigation, SEO, and search.
- Keep heavy generated images and ebook assets in R2.
- Reference published assets with absolute `https://img.svc.plus/...` URLs.
- Keep object names immutable after publication; publish revisions with `-v2`, `-v3`.
- Run `rclone check` before replacing local Markdown image references.
- Keep raw generation artifacts separate from article-ready assets when possible.
- Use topic directories that match the ebook taxonomy so future RAG/vector indexing can attach image context to chapters.

View File

@ -0,0 +1,7 @@
cd /Users/shenlan/workspaces/cloud-neutral-toolkit/playbooks && ansible-playbook \
-i "xworkmate-bridge.svc.plus," \
--user ubuntu \
-e "xworkspace_console_hosts=xworkmate-bridge.svc.plus" \
-e "xworkspace_console_local_dashboard_dir=/home/ubuntu/xworkspace/dashboard" \
-e "ansible_become_pass=XXXXXXXXX" \
setup-xworkspace-console.yaml

View File

@ -0,0 +1,15 @@
CF_ACCOUNT_ID=
CF_ZONE_ID=
CLOUDFLARE_API_TOKEN=
R2_ACCESS_KEY_ID=
R2_SECRET_ACCESS_KEY=
R2_BUCKET=yitu-it-series
R2_REMOTE=cloudflare-r2
R2_CUSTOM_DOMAIN=img.svc.plus
LOCAL_SRC=/Users/shenlan/Library/CloudStorage/GoogleDrive-haitaopanhq@gmail.com/我的云端硬盘/自媒体
RCLONE_TRANSFERS=16
RCLONE_CHECKERS=32
RCLONE_S3_UPLOAD_CUTOFF=128M
RCLONE_S3_CHUNK_SIZE=128M

View File

@ -0,0 +1,8 @@
---
- name: Prepare Host Environment
hosts: all
become: true
roles:
- roles/vhosts/common
- roles/vhosts/kernel_tuning
- roles/docker/container_runtime

View File

@ -0,0 +1,7 @@
---
- name: Install Kubernetes via Sealos
hosts: masters
become: true
roles:
- roles/vhosts/sealos_cluster
- roles/vhosts/cni_cilium

View File

@ -0,0 +1,6 @@
---
- name: Install NVIDIA GPU Operator
hosts: masters[0]
become: true
roles:
- roles/charts/nvidia_gpu_operator

7
gpu_inference_04_ray.yml Normal file
View File

@ -0,0 +1,7 @@
---
- name: Deploy Ray Cluster
hosts: masters[0]
become: true
roles:
- roles/charts/ray_cluster
- roles/charts/ray_service

View File

@ -0,0 +1,7 @@
---
- name: Deploy vLLM Inference Service
hosts: masters[0]
become: true
roles:
- roles/charts/vllm_runtime
- roles/charts/vllm_service

6
gpu_inference_site.yml Normal file
View File

@ -0,0 +1,6 @@
---
- import_playbook: gpu_inference_01_prepare.yml
- import_playbook: gpu_inference_02_sealos.yml
- import_playbook: gpu_inference_03_gpu_operator.yml
- import_playbook: gpu_inference_04_ray.yml
- import_playbook: gpu_inference_05_vllm.yml

View File

@ -3,3 +3,17 @@ ansible_ssh_user: root
ansible_ssh_private_key_file: ~/.ssh/id_rsa
ansible_host_key_checking: False
# Global security level for public access.
# Set to 'strict' to disable public Caddy/Ingress access for all roles.
ai_workspace_security_level: standard
# Caddy ingress is enabled by default on Linux where we expect a dedicated box.
# It is disabled on macOS (developer workstation with port conflicts) and Windows
# (Caddy not natively supported in our Windows pipeline).
# Override anytime with -e caddy_enabled=true or -e caddy_enabled=false.
caddy_enabled: "{{ ansible_os_family != 'Darwin' and ansible_os_family != 'Windows' }}"
# Caddy config root. Linux uses the system path /etc/caddy; macOS (Homebrew)
# uses /opt/homebrew/etc/caddy. Roles derive their Caddyfile / conf.d / fragment
# paths from this so a force-enabled Caddy on macOS writes to the brew location.
caddy_config_dir: "{{ '/opt/homebrew/etc/caddy' if ansible_os_family == 'Darwin' else '/etc/caddy' }}"

View File

@ -0,0 +1,49 @@
---
xworkmate_bridge_distributed_topology: dual-node
xworkmate_bridge_distributed_nodes:
- id: xworkmate-bridge
role: primary
public_base_url: https://xworkmate-bridge.svc.plus
bridge_endpoint: http://172.29.10.1:8787
- id: cn-xworkmate-bridge
role: edge
public_base_url: https://cn-xworkmate-bridge.svc.plus
bridge_endpoint: http://172.29.10.2:8787
xworkmate_bridge_distributed_vpn_interface: wg-xwm
xworkmate_bridge_distributed_vpn_wireguard_port: 51820
xworkmate_bridge_distributed_vpn_local_tproxy_port: 51830
xworkmate_bridge_distributed_vpn_vless_port: 2443
xworkmate_bridge_distributed_vpn_forwarder_port: 8787
xworkmate_bridge_distributed_vpn_forwarder_target: 127.0.0.1:8787
xworkmate_bridge_distributed_vpn_vault_addr: "{{ lookup('ansible.builtin.env', 'VAULT_SERVER_URL') | default('https://vault.svc.plus', true) }}"
xworkmate_bridge_distributed_vpn_vault_token: "{{ lookup('ansible.builtin.env', 'VAULT_SERVER_ROOT_ACCESS_TOKEN') | default(lookup('ansible.builtin.env', 'VAULT_TOKEN'), true) }}"
xworkmate_bridge_distributed_vpn_vault_mount: kv
xworkmate_bridge_distributed_vpn_vault_base_path: xworkmate-bridge/distributed/wireguard-over-vless
xworkmate_bridge_distributed_vpn_nodes:
jp-xhttp-contabo.svc.plus:
node_id: xworkmate-bridge
domain: xworkmate-bridge.svc.plus
wg_ip: 172.29.10.1
public_key: 1staGq8lmHFRFRFNj2QOFx/MPxb/1fFV4tawC6xSi1Q= # gitleaks:allow
peer: cn-xworkmate-bridge.svc.plus
cn-xworkmate-bridge.svc.plus:
node_id: cn-xworkmate-bridge
domain: cn-xworkmate-bridge.svc.plus
wg_ip: 172.29.10.2
public_key: iYlnFaWiMfMelpiN8ZV2SwCDrLihqtJXvHUsM3BN9zU= # gitleaks:allow
peer: jp-xhttp-contabo.svc.plus
xworkmate_bridge_distributed_vpn_clients:
- id: shenlan-macos
wg_ip: 172.29.10.10
public_key: jfHsw1HIqRQzGvfsRfdkS7BLThDbBvWMsAlJRp1kdkw= # gitleaks:allow
attach_to:
- jp-xhttp-contabo.svc.plus
- cn-xworkmate-bridge.svc.plus
- id: shenlan-ios
wg_ip: 172.29.10.11
public_key: I/zCL7gLWrY6FZiLXUs7i/vivU5Xuo8r7EbkNhtv12w= # gitleaks:allow
attach_to:
- jp-xhttp-contabo.svc.plus

View File

@ -0,0 +1,16 @@
---
xworkmate_bridge_domain: cn-xworkmate-bridge.svc.plus
xworkmate_bridge_public_base_url: https://cn-xworkmate-bridge.svc.plus
xworkmate_bridge_service_domain: cn-xworkmate-bridge.svc.plus
xworkmate_bridge_service_public_base_url: https://cn-xworkmate-bridge.svc.plus
xworkmate_bridge_binary_path: /usr/local/bin/xworkmate-bridge
xworkmate_bridge_service_user: ubuntu
xworkmate_bridge_service_group: ubuntu
xworkmate_bridge_service_home: /home/ubuntu
xworkmate_bridge_required_services: []
xworkmate_bridge_required_listeners:
- host: 127.0.0.1
port: "8787"
name: bridge
xworkmate_bridge_distributed_local_node_id: cn-xworkmate-bridge
xworkmate_bridge_distributed_task_forward_peer_id: xworkmate-bridge

View File

@ -0,0 +1,2 @@
---
gateway_openclaw_acp_enabled: true

View File

@ -0,0 +1,25 @@
---
# LiteLLM Admin UI Credentials
litellm_basic_auth_username: admin
# Database Configuration
litellm_database_host: "127.0.0.1"
litellm_database_port: "15432"
litellm_database_sslmode: "disable"
litellm_database_name: "litellm"
litellm_database_user: "litellm"
litellm_database_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
33303762616661623962386664653533666362393435343830303061613364666238313933626330
6231303463663732313732376238633033386463383134630a313738393762333263653363376266
37323938386331383762363565613361623638353834363735363030363037626666663431613239
3939323036313435360a313833316131663231326162393364616262323763333133336335323837
31336464646334653035646633363164633363353835316466626337633238396130
litellm_master_key: !vault |
$ANSIBLE_VAULT;1.1;AES256
38303433656665323039303561326534636136623766303563313863633133333564343830663032
3038343634323835666430663165343461643338343334330a623764393034356330303263366161
30663735663237653135356663343063353330356137643534313637313062633964383266376263
6432333730393232380a363037333265363462323139306534343563323631616464616132313631
31396232386633656436653966626131663139643539633964633864643930643639

View File

@ -0,0 +1,3 @@
---
xworkmate_bridge_distributed_local_node_id: xworkmate-bridge
xworkmate_bridge_distributed_task_forward_peer_id: ""

View File

@ -1,11 +1,23 @@
# Vhosts
[cn_front_host]
# services: cn-front.svc.plus
# services: cn-front.svc.plus, cn-homepage.svc.plus
cn-front.svc.plus ansible_host=47.120.61.35 ansible_user=root ansible_ssh_user=root firewall_manage_ufw=false service_domains=cn-front.svc.plus
[cn_homepage_host]
# services: cn-homepage.svc.plus
cn-homepage.svc.plus ansible_host=47.120.61.35 ansible_user=root ansible_ssh_user=root
[cn_xworkmate_bridge_host]
# services: cn-xworkmate-bridge.svc.plus
cn-xworkmate-bridge.svc.plus ansible_host=47.120.61.35 ansible_user=root ansible_ssh_user=root service_domains=cn-xworkmate-bridge.svc.plus
[global_homepage_host]
# services: global-homepage.svc.plus
global-homepage.svc.plus ansible_host=46.250.251.132 ansible_user=root ansible_ssh_user=root
[jp_xhttp_contabo_host]
# services: api.svc.plus, console.svc.plus, accounts.svc.plus, acp-server.svc.plus, xworkmate-bridge.svc.plus, vault.svc.plus, openclaw.svc.plus, postgresql.svc.plus
jp-xhttp-contabo.svc.plus ansible_host=46.250.251.132 ansible_user=root ansible_ssh_user=root service_domains=api.svc.plus,console.svc.plus,accounts.svc.plus,acp-server.svc.plus,xworkmate-bridge.svc.plus,vault.svc.plus,openclaw.svc.plus,postgresql.svc.plus xray_exporter_node_id_custom=jp-xhttp-contabo.svc.plus
# services: api.svc.plus, console.svc.plus, docs.svc.plus, accounts.svc.plus, xworkmate-bridge.svc.plus, xworkmate-bridge.svc.plus, vault.svc.plus, postgresql.svc.plus
jp-xhttp-contabo.svc.plus ansible_host=46.250.251.132 ansible_user=root ansible_ssh_user=root service_domains=api.svc.plus,console.svc.plus,docs.svc.plus,accounts.svc.plus,xworkmate-bridge.svc.plus,xworkmate-bridge.svc.plus,vault.svc.plus,postgresql.svc.plus xray_exporter_node_id_custom=jp-xhttp-contabo.svc.plus
[tky_proxy_host]
# services: tky-proxy.svc.plus
@ -31,12 +43,25 @@ jp-xhttp-contabo.svc.plus
tky-proxy.svc.plus
jp-xhttp-contabo.svc.plus
[xworkmate_bridge]
jp-xhttp-contabo.svc.plus
[cn_xworkmate_bridge]
cn-xworkmate-bridge.svc.plus
[xworkmate_bridge_distributed:children]
xworkmate_bridge
cn_xworkmate_bridge
[billing_service]
jp-xhttp-contabo.svc.plus
[accounts]
jp-xhttp-contabo.svc.plus
[docs]
jp-xhttp-contabo.svc.plus
[apisix]
jp-xhttp-contabo.svc.plus
@ -58,3 +83,6 @@ ansible_host_key_checking=False
# SSH 密钥或密码(二选一)
ansible_ssh_private_key_file=~/.ssh/id_rsa
k3s_platform_git_private_key=~/.ssh/id_rsa
[acp_bridge_host]
acp-bridge.onwalk.net ansible_host=167.179.110.129 ansible_user=root ansible_ssh_user=root

View File

@ -0,0 +1,27 @@
---
# 全局版本与镜像
kubernetes_version: "v1.28.9"
sealos_version: "5.0.0"
cilium_version: "1.15.5"
gpu_operator_version: "v24.3.0"
kuberay_version: "1.1.0"
ray_version: "2.9.0"
vllm_image: "vllm/vllm-openai:v0.4.2"
# 网络配置
pod_cidr: "10.244.0.0/16"
service_cidr: "10.96.0.0/12"
nccl_socket_ifname: "eth0"
gloo_socket_ifname: "eth0"
# 模型与推理配置
vllm_model: "/models/Llama-3-70B-Instruct"
vllm_tensor_parallel_size: 2
vllm_pipeline_parallel_size: 1
# GPU 驱动策略
driver_enabled: true
driver_version: "535.129.03"
dcgm_exporter_enabled: true
ansible_user: "root"

View File

@ -0,0 +1 @@
---

View File

@ -0,0 +1 @@
---

View File

@ -0,0 +1 @@
---

13
inventory/hosts.ini Normal file
View File

@ -0,0 +1,13 @@
[masters]
k8s-master-01 ansible_host=10.0.0.10
[gpu_workers]
k8s-gpu-01 ansible_host=10.0.0.21 accelerator=nvidia-h100
k8s-gpu-02 ansible_host=10.0.0.22 accelerator=nvidia-h100
[ray_workers:children]
gpu_workers
[k8s_cluster:children]
masters
gpu_workers

106
inventory/terraform_cmdb.py Executable file
View File

@ -0,0 +1,106 @@
#!/usr/bin/env python3
"""Ansible 动态 inventory —— 数据源为 Terraform 导出的 CMDB。
IAC 联动方式
iac_modules/terraform-hcl-standard/vultr-vps/envs/ai-workspace/ generate.py
`terraform apply` YAML 静态字段与 terraform 运行时输出合并写出
cmdb.json结构化主机事实本脚本把它翻译成 Ansible 动态 inventory
于是 IaC 一变更重跑 `generate.py inventory`inventory 就跟着变
取数优先级
1. 环境变量 AI_WORKSPACE_CMDB_JSON 指向的文件
2. 环境变量 AI_WORKSPACE_TF_DIR或默认 env 目录下的 cmdb.json
用法
ansible-inventory -i inventory/terraform_cmdb.py --list
ansible all -i inventory/terraform_cmdb.py -m ping
"""
import json
import os
import sys
HERE = os.path.dirname(os.path.abspath(__file__))
# playbooks/inventory -> 仓库根 -> terraform env
REPO_ROOT = os.path.abspath(os.path.join(HERE, "..", ".."))
DEFAULT_TF_DIR = os.path.join(
REPO_ROOT,
"iac_modules",
"terraform-hcl-standard",
"vultr-vps",
"envs",
"ai-workspace",
)
def _from_explicit_file():
path = os.environ.get("AI_WORKSPACE_CMDB_JSON")
if path and os.path.isfile(path):
with open(path, encoding="utf-8") as fh:
return json.load(fh)
return None
def _from_default_file(tf_dir):
path = os.path.join(tf_dir, "cmdb.json")
if os.path.isfile(path):
with open(path, encoding="utf-8") as fh:
return json.load(fh)
return None
def load_cmdb():
tf_dir = os.environ.get("AI_WORKSPACE_TF_DIR", DEFAULT_TF_DIR)
for loader in (
_from_explicit_file,
lambda: _from_default_file(tf_dir),
):
data = loader()
if data:
return data
return {}
def build_inventory(cmdb):
inv = {"_meta": {"hostvars": {}}}
groups = {}
for name, host in cmdb.items():
hostvars = {
"ansible_host": host.get("ip"),
"ansible_user": host.get("ansible_user", "root"),
# 云主机 IP 常被回收,放宽 host key 校验避免撞到旧 known_hosts
"ansible_ssh_common_args": (
"-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
),
}
# CMDB 其余字段一并暴露给 playbook 使用
hostvars.update(host.get("host_vars", {}))
hostvars["cmdb_instance_id"] = host.get("instance_id")
hostvars["cmdb_os_id"] = host.get("os_id")
hostvars["cmdb_tags"] = host.get("tags", [])
inv["_meta"]["hostvars"][name] = hostvars
for group in host.get("groups", []) or ["ungrouped"]:
groups.setdefault(group, {"hosts": []})["hosts"].append(name)
inv.update(groups)
inv["all"] = {"children": sorted(list(groups.keys()) + ["ungrouped"])}
return inv
def main():
args = sys.argv[1:]
cmdb = load_cmdb()
if "--host" in args:
# hostvars 已在 _meta 里,单主机查询返回空对象即可
print(json.dumps({}))
return
# 默认与 --list 行为一致
print(json.dumps(build_inventory(cmdb), indent=2))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,95 @@
# Agent Skills
Synchronizes controller skill sources to an Ubuntu runtime user's canonical
skills directory, then exposes the same directory to agent-specific skill
locations.
Default source and target:
- local marketplace source: `~/.agents/skills/`
- local repository source: `../xworkspace-core-skills/skills/`
- remote canonical path: `/home/ubuntu/.agents/skills/`
- default agent targets: `codex`, `gemini`, `opencode`, `hermers`, `openclaw`
The repository source is categorized by capability domain, for example
`video-production/`, `image-production/`, `animation/`, and `workspace-core/`.
The role syncs those categories as-is, then creates root-level symlinks for
nested skills so runtimes that scan one directory level can still discover them.
Set `agent_skills_xworkspace_core_enabled=false` to use only the marketplace
source, or `agent_skills_remote_flatten_nested_skills=false` to disable root
symlink materialization.
The role keeps one remote source of truth and links each agent's skills entry to
that canonical directory where the online runtime already uses links. Existing
non-symlink target directories are rejected by default to avoid silently deleting
agent-owned content. The live `ubuntu` Codex runtime on
`xworkmate-bridge.svc.plus` keeps `/home/ubuntu/.codex/skills` as a real
directory, so it is preserved by default through
`agent_skills_preserve_existing_target_dirs`. Set
`agent_skills_replace_existing_target_dirs=true` only when those target
directories should be replaced.
Before syncing, the role can materialize the skills needed by XWorkmate typical
scenario tests into the local canonical source. The default matrix includes:
| Scenario group | Skills |
| --- | --- |
| local document artifacts | `pptx`, `docx`, `xlsx`, `pdf` |
| local image processing | `image-resizer` |
| local browser automation | `browser-automation` |
| online image generation | `image-cog` |
| online image/video editing | `image-video-generation-editting`, `wan-image-video-generation-editting` |
| online video translation | `video-translator` |
| online news/search | `web-search`, `news-fetch`, `find-skills` |
| skill maintenance | `find-skills`, `self-improving`, `skill-vetter`, `skills-security-check` |
Missing local skills are installed on the Ansible controller before rsync. The
installer adapter order is:
1. `clawhub --workdir ~/.agents --dir skills --no-input install <skill>`
2. `find-skills install <skill> --target ~/.agents/skills`
Set `agent_skills_auto_install_enabled=false` to require that all skills are
already present locally. Set
`agent_skills_auto_install_fail_on_missing_installer=false` to skip missing
skills when neither installer is available; the role still fails later if a
required skill cannot be resolved.
Required-skill checks search both the marketplace source and the categorized
repository source recursively. Auto-install still writes only to
`~/.agents/skills/`; repository-owned skills should be changed in
`xworkspace-core-skills`.
After install, optional local quality gates run for each resolved skill when the
command exists:
- `skill-vetter <skill_path>`
- `skills-security-check <skill_path>`
- `self-improving inspect <skill_path>`
The quality gates are enabled by default and fail the play when a present gate
returns an error. Override `agent_skills_quality_gate_enabled=false` or
`agent_skills_quality_gate_fail_on_error=false` only for controlled bootstrap
environments.
Default sync excludes local runtime artifacts such as `.venv/`, `__pycache__/`,
`.pyc`, and `.DS_Store`; skills should ship source, scripts, templates, and
references rather than controller-local virtual environments.
The sync defaults to overlay mode (`agent_skills_delete_removed=false`) so it
does not remove skills that already exist on the live runtime catalog. Enable
deletion only for controlled rebuilds of `/home/ubuntu/.agents/skills/`.
Example:
```bash
ansible-playbook -i inventory.ini -l jp-xhttp-contabo.svc.plus setup-ai-agent-skills.yml --tags agent_skills
```
Bootstrap-only example that keeps the existing local source strict but skips
quality gate failures from newly installed marketplace skills:
```bash
ansible-playbook -i inventory.ini -l jp-xhttp-contabo.svc.plus setup-ai-agent-skills.yml --tags agent_skills \
-e agent_skills_quality_gate_fail_on_error=false
```

View File

@ -0,0 +1,126 @@
---
agent_skills_user: "{{ ansible_env.USER | default('ubuntu') }}"
agent_skills_group: "{{ 'staff' if ansible_os_family == 'Darwin' else agent_skills_user }}"
agent_skills_home: "{{ ansible_env.HOME | default('/home/' + agent_skills_user) }}"
# 规范化技能落地目录canonical始终在目标主机上。installer 直接装到这里,
# core 技能 clone 后合并进来。本地/pull 与远程 controller 两种模型行为一致。
agent_skills_remote_dir: "{{ agent_skills_home }}/.agents/skills"
# xworkspace-core-skills 以 git clone 获取(最通用、跨平台、双模型一致),
# 在目标主机上 clone不再依赖 controller 端预置目录。
agent_skills_xworkspace_core_enabled: true
agent_skills_xworkspace_core_required: true
agent_skills_xworkspace_core_repo_url: "https://github.com/ai-workspace-lab/xworkspace-core-skills.git"
agent_skills_xworkspace_core_version: "main"
agent_skills_xworkspace_core_clone_dir: "{{ agent_skills_home }}/.local/src/xworkspace-core-skills"
agent_skills_xworkspace_core_source_dir: "{{ agent_skills_xworkspace_core_clone_dir }}/skills"
agent_skills_replace_existing_target_dirs: false
agent_skills_preserve_existing_target_dirs:
- "{{ agent_skills_home }}/.codex/skills"
agent_skills_remote_flatten_nested_skills: true
agent_skills_auto_install_enabled: true
agent_skills_auto_install_fail_on_missing_installer: true
agent_skills_quality_gate_enabled: true
agent_skills_quality_gate_fail_on_error: true
agent_skills_quality_gate_commands:
- name: skill-vetter
argv_prefix:
- skill-vetter
- name: skills-security-check
argv_prefix:
- skills-security-check
- name: self-improving
argv_prefix:
- self-improving
- inspect
agent_skills_typical_scenario_skills:
- name: pptx
scenario_groups: [local-document-artifacts]
aliases: [pptx]
source: local-or-clawhub
- name: docx
scenario_groups: [local-document-artifacts]
aliases: [docx]
source: local-or-clawhub
- name: xlsx
scenario_groups: [local-document-artifacts]
aliases: [xlsx]
source: local-or-clawhub
- name: pdf
scenario_groups: [local-document-artifacts]
aliases: [pdf]
source: local-or-clawhub
- name: image-resizer
scenario_groups: [local-image-processing]
aliases: [image-resizer]
source: clawhub
- name: browser-automation
scenario_groups: [local-browser-automation]
aliases: [browser-automation]
source: clawhub
- name: image-cog
scenario_groups: [online-image-generation]
aliases: [image-cog]
source: acp-descriptor
- name: image-video-generation-editting
scenario_groups: [online-image-video-editing]
aliases:
- image-video-generation-editting
- wan-image-video-generation-editting
source: acp-descriptor
- name: video-translator
scenario_groups: [online-video-translation]
aliases: [video-translator]
source: acp-descriptor
- name: web-search
scenario_groups: [online-news-fetch, online-search]
aliases:
- web-search
- search
- autoglm-websearch
source: clawhub
- name: news-fetch
scenario_groups: [online-news-fetch]
aliases:
- news-fetch
- blogwatcher
source: clawhub
- name: find-skills
scenario_groups: [skill-maintenance, online-news-fetch, online-search]
aliases: [find-skills]
source: local-or-clawhub
- name: self-improving
scenario_groups: [skill-maintenance]
aliases:
- self-improving
- self-improving-1.1.3
source: local-or-clawhub
- name: skill-vetter
scenario_groups: [skill-maintenance]
aliases:
- skill-vetter
- skill-vetter-1.0.0
source: local-or-clawhub
- name: skills-security-check
scenario_groups: [skill-maintenance]
aliases: [skills-security-check]
source: local-or-clawhub
install_force: true
agent_skills_extra_required_skills: []
agent_skills_targets:
- name: codex
paths:
- "{{ agent_skills_home }}/.codex/skills"
- name: gemini
paths:
- "{{ agent_skills_home }}/.gemini/skills"
- name: opencode
paths:
- "{{ agent_skills_home }}/.opencode/skills"
- "{{ agent_skills_home }}/.config/opencode/skills"
- name: openclaw
paths:
- "{{ agent_skills_home }}/.openclaw/skills"

View File

@ -0,0 +1,348 @@
---
# 设计:全程在「目标主机」上执行——没有任何 delegate_to: localhost。
# 因此两种执行模型行为完全一致:
# - 本地/pullcurl|bash → ansible-playbook -c locallocalhost 即主机)
# - 远程 controlleransible-playbook -i <inventory> over ssh任务在主机上跑
# 源以 git clone 获取(最通用、跨平台),不再依赖 controller 端预置目录,
# 合并用 ansible.builtin.copy无裸 rsync、无本地钉死
- name: Validate agent skills input
ansible.builtin.assert:
that:
- agent_skills_user | length > 0
- agent_skills_group | length > 0
- agent_skills_home | length > 0
- agent_skills_remote_dir | length > 0
- agent_skills_targets | length > 0
fail_msg: "agent_skills_user/group/home, remote_dir and targets must be set."
- name: Build required agent skills list
ansible.builtin.set_fact:
agent_skills_required_entries: "{{ agent_skills_typical_scenario_skills + agent_skills_extra_required_skills }}"
- name: Ensure agent skills owner home exists
ansible.builtin.file:
path: "{{ agent_skills_home }}"
state: directory
owner: "{{ agent_skills_user }}"
group: "{{ agent_skills_group }}"
mode: "0755"
- name: Ensure canonical agent skills directory exists
ansible.builtin.file:
path: "{{ agent_skills_remote_dir }}"
state: directory
owner: "{{ agent_skills_user }}"
group: "{{ agent_skills_group }}"
mode: "0755"
# --- 源获取:在目标主机 git clone最通用 ---------------------------------
- name: Ensure core skills checkout parent exists
ansible.builtin.file:
path: "{{ agent_skills_xworkspace_core_clone_dir | dirname }}"
state: directory
owner: "{{ agent_skills_user }}"
group: "{{ agent_skills_group }}"
mode: "0755"
when: agent_skills_xworkspace_core_enabled | bool
- name: Clone/update xworkspace core skills on the target host
ansible.builtin.git:
repo: "{{ agent_skills_xworkspace_core_repo_url }}"
dest: "{{ agent_skills_xworkspace_core_clone_dir }}"
version: "{{ agent_skills_xworkspace_core_version }}"
depth: 1
force: true
become_user: "{{ agent_skills_user }}"
register: agent_skills_core_clone
when: agent_skills_xworkspace_core_enabled | bool
- name: Inspect core skills directory
ansible.builtin.stat:
path: "{{ agent_skills_xworkspace_core_source_dir }}"
register: agent_skills_core_skills_stat
when: agent_skills_xworkspace_core_enabled | bool
- name: Require core skills directory when enabled and required
ansible.builtin.assert:
that:
- agent_skills_core_skills_stat.stat.isdir | default(false)
fail_msg: "core skills dir missing after clone: {{ agent_skills_xworkspace_core_source_dir }}"
when:
- agent_skills_xworkspace_core_enabled | bool
- agent_skills_xworkspace_core_required | bool
- name: Build skill search dirs (canonical + core checkout)
ansible.builtin.set_fact:
agent_skills_search_dirs: >-
{{
[agent_skills_remote_dir]
+ (
(
agent_skills_xworkspace_core_enabled | bool
and agent_skills_core_skills_stat.stat.isdir | default(false)
)
| ternary([agent_skills_xworkspace_core_source_dir], [])
)
}}
# --- 缺失场景技能:用 installer 适配器装到 canonical主机本地 --------------
- name: Inspect required scenario skills presence
ansible.builtin.shell: |
set -eu
for d in {{ agent_skills_search_dirs | map('quote') | join(' ') }}; do
for c in {{ ([item.name] + (item.aliases | default([]))) | unique | map('quote') | join(' ') }}; do
if [ -f "$d/$c/SKILL.md" ]; then printf '%s\n' "$d/$c"; exit 0; fi
m="$(find "$d" -type f -path "*/$c/SKILL.md" -print -quit 2>/dev/null || true)"
if [ -n "$m" ]; then dirname "$m"; exit 0; fi
done
done
exit 1
args:
executable: /bin/bash
register: agent_skills_presence
changed_when: false
failed_when: false
check_mode: false
loop: "{{ agent_skills_required_entries }}"
loop_control:
label: "{{ item.name }}"
- name: Build missing scenario skills list
ansible.builtin.set_fact:
agent_skills_missing_entries: >-
{{ agent_skills_presence.results | selectattr('rc', 'ne', 0) | map(attribute='item') | list }}
- name: Install missing scenario skills via installer adapters (clawhub/find-skills)
ansible.builtin.shell: |
set -eu
skill={{ item.name | quote }}
target_dir={{ agent_skills_remote_dir | quote }}
parent="$(dirname "$target_dir")"; base="$(basename "$target_dir")"
rc=1
if command -v clawhub >/dev/null 2>&1; then
for n in {{ ([item.install_name | default(item.name)] + (item.aliases | default([]))) | unique | map('quote') | join(' ') }}; do
if clawhub --workdir "$parent" --dir "$base" --no-input install {{ (item.install_force | default(false) | bool) | ternary('--force', '') }} "$n"; then rc=0; break; fi
done
exit "$rc"
elif command -v find-skills >/dev/null 2>&1; then
for n in {{ ([item.install_name | default(item.name)] + (item.aliases | default([]))) | unique | map('quote') | join(' ') }}; do
if find-skills install "$n" --target "$target_dir"; then rc=0; break; fi
done
exit "$rc"
elif [ "{{ agent_skills_auto_install_fail_on_missing_installer | bool | ternary('true', 'false') }}" = "true" ]; then
echo "No installer (clawhub/find-skills) for $skill; preseed $target_dir/$skill." >&2
exit 127
else
echo "Skipped missing $skill (no installer adapter)." >&2
fi
args:
executable: /bin/bash
become_user: "{{ agent_skills_user }}"
register: agent_skills_install_result
changed_when: agent_skills_install_result.rc == 0
loop: "{{ agent_skills_missing_entries }}"
loop_control:
label: "{{ item.name }}"
when:
- agent_skills_auto_install_enabled | bool
- agent_skills_missing_entries | length > 0
# --- 合并 core 技能到 canonical主机本地 copy无 rsync、无 delegate --------
- name: Merge core skills into canonical directory
ansible.builtin.copy:
src: "{{ agent_skills_xworkspace_core_source_dir }}/"
dest: "{{ agent_skills_remote_dir }}/"
remote_src: true
owner: "{{ agent_skills_user }}"
group: "{{ agent_skills_group }}"
mode: preserve
when:
- agent_skills_xworkspace_core_enabled | bool
- agent_skills_core_skills_stat.stat.isdir | default(false)
- name: Re-inspect required scenario skills in canonical dir
ansible.builtin.shell: |
set -eu
d={{ agent_skills_remote_dir | quote }}
for c in {{ ([item.name] + (item.aliases | default([]))) | unique | map('quote') | join(' ') }}; do
if [ -f "$d/$c/SKILL.md" ]; then printf '%s\n' "$d/$c"; exit 0; fi
m="$(find "$d" -type f -path "*/$c/SKILL.md" -print -quit 2>/dev/null || true)"
if [ -n "$m" ]; then dirname "$m"; exit 0; fi
done
exit 1
args:
executable: /bin/bash
register: agent_skills_presence_final
changed_when: false
failed_when: false
check_mode: false
loop: "{{ agent_skills_required_entries }}"
loop_control:
label: "{{ item.name }}"
- name: Assert required scenario skills are available
ansible.builtin.assert:
that:
- (agent_skills_presence_final.results | selectattr('rc', 'ne', 0) | list | length) == 0
fail_msg: >-
Required scenario skills still missing under {{ agent_skills_remote_dir }}:
{{ agent_skills_presence_final.results | selectattr('rc', 'ne', 0) | map(attribute='item.name') | join(', ') }}.
- name: Build resolved skill paths
ansible.builtin.set_fact:
agent_skills_resolved_paths: >-
{{ agent_skills_presence_final.results | selectattr('rc', 'eq', 0) | map(attribute='stdout') | list | unique }}
- name: Run optional scenario skill quality gates
ansible.builtin.shell: |
set -eu
skill_path={{ item.0 | quote }}
gate_name={{ item.1.name | quote }}
if command -v "$gate_name" >/dev/null 2>&1; then
{{ item.1.argv_prefix | map('quote') | join(' ') }} "$skill_path"
else
echo "Skipped missing quality gate: $gate_name"
fi
args:
executable: /bin/bash
become_user: "{{ agent_skills_user }}"
register: agent_skills_quality_gate_results
changed_when: false
failed_when: agent_skills_quality_gate_fail_on_error | bool and agent_skills_quality_gate_results.rc != 0
loop: "{{ agent_skills_resolved_paths | product(agent_skills_quality_gate_commands) | list }}"
loop_control:
label: "{{ item.1.name }} {{ item.0 | basename }}"
when:
- agent_skills_quality_gate_enabled | bool
- agent_skills_resolved_paths | length > 0
check_mode: false
- name: Set canonical agent skills ownership
ansible.builtin.file:
path: "{{ agent_skills_remote_dir }}"
state: directory
owner: "{{ agent_skills_user }}"
group: "{{ agent_skills_group }}"
recurse: true
# --- 把分类嵌套技能在 canonical 根做扁平 symlink主机本地 ------------------
- name: Link nested categorized skills at canonical root
ansible.builtin.shell: |
set -eu
changed=0
while IFS= read -r skill_manifest; do
skill_dir="$(dirname "$skill_manifest")"
skill_name="$(basename "$skill_dir")"
link_path={{ agent_skills_remote_dir | quote }}/"$skill_name"
if [ -e "$link_path" ] && [ ! -L "$link_path" ]; then continue; fi
current_target=""
if [ -L "$link_path" ]; then current_target="$(readlink "$link_path")"; fi
if [ "$current_target" != "$skill_dir" ]; then
if [ "{{ ansible_check_mode | ternary('true', 'false') }}" != "true" ]; then
ln -sfn "$skill_dir" "$link_path"
fi
changed=1
fi
done < <(find {{ agent_skills_remote_dir | quote }} -mindepth 3 -name SKILL.md -type f -print)
if [ "$changed" = "1" ]; then echo "<<CHANGED>>linked nested skills"; fi
args:
executable: /bin/bash
register: agent_skills_flatten_result
changed_when: "'<<CHANGED>>' in agent_skills_flatten_result.stdout"
check_mode: false
when: agent_skills_remote_flatten_nested_skills | bool
- name: Set canonical agent skills ownership after nested links
ansible.builtin.file:
path: "{{ agent_skills_remote_dir }}"
state: directory
owner: "{{ agent_skills_user }}"
group: "{{ agent_skills_group }}"
recurse: true
when: agent_skills_remote_flatten_nested_skills | bool
# --- 把各 Agent 的 skills 目录 symlink 到 canonical ---------------------------
- name: Flatten agent skills target paths
ansible.builtin.set_fact:
agent_skills_target_paths: "{{ agent_skills_targets | subelements('paths') | map('last') | list }}"
- name: Inspect agent skills target paths
ansible.builtin.stat:
path: "{{ item }}"
register: agent_skills_target_path_stats
loop: "{{ agent_skills_target_paths }}"
- name: Reject existing non-link target directories unless replacement is enabled
ansible.builtin.fail:
msg: >-
Agent skills target already exists and is not a symlink: {{ item.item }}.
Set agent_skills_replace_existing_target_dirs=true to replace it with a link
to {{ agent_skills_remote_dir }}.
loop: "{{ agent_skills_target_path_stats.results }}"
when:
- item.stat.exists | default(false)
- not item.stat.islnk | default(false)
- item.item not in agent_skills_preserve_existing_target_dirs
- not agent_skills_replace_existing_target_dirs | bool
- name: Replace existing non-link target directories when enabled
ansible.builtin.file:
path: "{{ item.item }}"
state: absent
loop: "{{ agent_skills_target_path_stats.results }}"
when:
- item.stat.exists | default(false)
- not item.stat.islnk | default(false)
- item.item not in agent_skills_preserve_existing_target_dirs
- agent_skills_replace_existing_target_dirs | bool
- name: Build agent skills target parent paths
ansible.builtin.set_fact:
agent_skills_target_parent_paths: "{{ agent_skills_target_paths | map('dirname') | list | unique }}"
- name: Inspect agent skills target parent directories
ansible.builtin.stat:
path: "{{ item }}"
register: agent_skills_target_parent_stats
loop: "{{ agent_skills_target_parent_paths }}"
- name: Ensure agent skills target parent directories exist
ansible.builtin.file:
path: "{{ item.item }}"
state: directory
owner: "{{ agent_skills_user }}"
group: "{{ agent_skills_group }}"
mode: "0755"
loop: "{{ agent_skills_target_parent_stats.results }}"
when:
- not item.stat.exists | default(false)
- name: Link agent skills targets to canonical directory
ansible.builtin.file:
src: "{{ agent_skills_remote_dir }}"
dest: "{{ item }}"
state: link
force: true
loop: "{{ agent_skills_target_paths }}"
when: item not in agent_skills_preserve_existing_target_dirs
- name: Verify canonical skill manifests are present
ansible.builtin.find:
paths: "{{ agent_skills_remote_dir }}"
patterns: SKILL.md
recurse: true
file_type: file
register: agent_skills_manifest_files
- name: Assert synced agent skills contain manifests
ansible.builtin.assert:
that:
- agent_skills_manifest_files.matched | int > 0
fail_msg: "No SKILL.md files found under {{ agent_skills_remote_dir }}."
- name: Report synced agent skills
ansible.builtin.debug:
msg: >-
{{ agent_skills_manifest_files.matched }} skill manifests under
{{ agent_skills_remote_dir }}; linked {{ agent_skills_target_paths | length }} agent targets.

View File

@ -0,0 +1,48 @@
# AI Agent Runtime
Provision a Debian-based host for AI agent and AI action execution with one
role entrypoint. The role installs:
- base tools: `curl`, `wget`, `git`, `jq`, `rsync`, `unzip`
- Node.js runtime for Playwright-based agents
- Python 3 toolchain for scripts and helpers
- existing system browser, preferring the live `/usr/local/bin/chromium` wrapper
or Google Chrome before installing browser packages
- `pandoc` + XeLaTeX PDF toolchain
- Chinese fonts for document rendering
- shared agent skills via `roles/agent_skills`, including the categorized
`../xworkspace-core-skills/skills/` repository source by default
Design constraints:
- system packages are the primary source of truth
- global npm packages are managed through
`/usr/local/sbin/ai-workspace-manage-npm-global-package` so repeated installs
are idempotent and stale global bin links can be overwritten safely
- Playwright uses the resolved system browser instead of downloading browsers
- Chinese PDF rendering is treated as a runtime requirement, not an optional add-on
Global npm package actions:
- `install` is the default and only changes the host when a package is missing
or an exact pinned version differs
- `reinstall` forces the configured package set back into place
- `upgrade`, `backup`, `restore`, and `migrate` are reserved action entrypoints
for future runtime lifecycle workflows
Default Playwright environment:
- `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1`
- `PLAYWRIGHT_BROWSERS_PATH=0`
- `PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/usr/local/bin/chromium` when that live
wrapper exists
Example:
```bash
ansible-playbook -i inventory.ini -l jp-xhttp-contabo.svc.plus setup-ai-agent-skills.yml
```
`setup-ai-agent-skills.yml` runs `roles/ai_agent_runtime`, which installs system
dependencies and syncs the current Skill catalog through the embedded
`roles/agent_skills` step in one pass.

View File

@ -0,0 +1,66 @@
---
ai_agent_runtime_base_packages:
- ca-certificates
- curl
- git
- jq
- rsync
- unzip
- wget
ai_agent_runtime_nodejs_enabled: true
ai_agent_runtime_nodejs_version: "24.16.0"
ai_agent_runtime_install_yarn: true
ai_agent_runtime_yarn_version: ""
ai_agent_runtime_npm_global_packages:
- opencode-ai
- "@google/gemini-cli"
- "@openai/codex"
- "@anthropic-ai/claude-code"
ai_agent_runtime_npm_global_package_action: install
ai_agent_runtime_agent_cli_commands:
- opencode
- gemini
- codex
- claude
ai_agent_runtime_playwright_enabled: true
ai_agent_runtime_playwright_version: "1.60.0"
ai_agent_runtime_playwright_skip_browser_download: true
ai_agent_runtime_python_enabled: true
ai_agent_runtime_python_packages:
- python3
- python3-pip
- python3-venv
- python3-dev
- python3-setuptools
- build-essential
- pkg-config
- python-is-python3
ai_agent_runtime_browser_enabled: true
ai_agent_runtime_browser_packages:
- google-chrome-stable
ai_agent_runtime_browser_executable: /usr/local/bin/chromium
ai_agent_runtime_docs_enabled: true
ai_agent_runtime_doc_packages:
- pandoc
- texlive-xetex
- texlive-latex-extra
- texlive-fonts-recommended
- texlive-lang-chinese
- latexmk
ai_agent_runtime_fonts_enabled: true
ai_agent_runtime_font_packages:
- fonts-noto-cjk
- fonts-noto-cjk-extra
- fonts-wqy-zenhei
- fonts-wqy-microhei
ai_agent_runtime_skills_enabled: true
ai_agent_runtime_skills_role_name: agent_skills
ai_agent_runtime_verify_enabled: true
ai_agent_runtime_verify_chinese_fonts: true

View File

@ -0,0 +1,113 @@
#!/usr/bin/env bash
set -euo pipefail
action="${1:-install}"
package_spec="${2:-}"
if [ -z "${package_spec}" ]; then
echo "Usage: $0 <install|reinstall|upgrade|backup|restore|migrate> <npm-package-spec>" >&2
exit 2
fi
package_name() {
local spec="$1"
if [[ "${spec}" == @* ]]; then
local rest="${spec#@}"
local scope="${rest%%/*}"
local after_scope="${rest#*/}"
local name="${after_scope%%@*}"
printf '@%s/%s\n' "${scope}" "${name}"
else
printf '%s\n' "${spec%%@*}"
fi
}
desired_version() {
local spec="$1"
if [[ "${spec}" == @* ]]; then
local rest="${spec#@}"
local after_scope="${rest#*/}"
if [[ "${after_scope}" == *"@"* ]]; then
printf '%s\n' "${after_scope#*@}"
fi
elif [[ "${spec}" == *"@"* ]]; then
printf '%s\n' "${spec#*@}"
fi
}
installed_version() {
local name="$1"
local npm_root
npm_root="$(npm root -g)"
node -e '
const fs = require("fs");
const path = require("path");
const pkg = process.argv[1];
const root = process.argv[2];
const packageJson = path.join(root, ...pkg.split("/"), "package.json");
if (!fs.existsSync(packageJson)) process.exit(1);
const parsed = JSON.parse(fs.readFileSync(packageJson, "utf8"));
process.stdout.write(parsed.version || "");
' "${name}" "${npm_root}"
}
is_installed() {
local name="$1"
local want="${2:-}"
local have
have="$(installed_version "${name}" 2>/dev/null || true)"
[ -n "${have}" ] || return 1
[ -z "${want}" ] || [ "${have}" = "${want}" ]
}
install_package() {
local spec="$1"
local name want
name="$(package_name "${spec}")"
want="$(desired_version "${spec}")"
if is_installed "${name}" "${want}"; then
echo "changed=0 action=install package=${spec}"
return
fi
npm install -g --force "${spec}"
echo "changed=1 action=install package=${spec}"
}
reinstall_package() {
local spec="$1"
npm install -g --force "${spec}"
echo "changed=1 action=reinstall package=${spec}"
}
upgrade_package() {
local spec="$1"
npm install -g --force "${spec}"
echo "changed=1 action=upgrade package=${spec}"
}
backup_package() {
echo "changed=0 action=backup package=${1} status=reserved"
}
restore_package() {
echo "changed=0 action=restore package=${1} status=reserved"
}
migrate_package() {
echo "changed=0 action=migrate package=${1} status=reserved"
}
case "${action}" in
install) install_package "${package_spec}" ;;
reinstall) reinstall_package "${package_spec}" ;;
upgrade) upgrade_package "${package_spec}" ;;
backup) backup_package "${package_spec}" ;;
restore) restore_package "${package_spec}" ;;
migrate) migrate_package "${package_spec}" ;;
*)
echo "Unsupported npm package action: ${action}" >&2
exit 2
;;
esac

View File

@ -0,0 +1,111 @@
---
- name: Resolve existing Chromium executable
ansible.builtin.shell: |
set -eu
for candidate in \
"{{ ai_agent_runtime_browser_executable }}" \
/usr/local/bin/chromium \
/usr/local/bin/chromium-browser \
/usr/bin/google-chrome \
/usr/bin/google-chrome-stable \
chromium \
chromium-browser \
google-chrome \
google-chrome-stable \
/usr/bin/chromium \
/snap/bin/chromium; do
resolved=""
if command -v "$candidate" >/dev/null 2>&1; then
resolved="$(command -v "$candidate")"
elif [ -x "$candidate" ]; then
resolved="$candidate"
fi
# 必须真正可执行:跳过 xfce 安装的 disabled chromium stub退出码 126
# 否则 resolver 会选中它,后续 --version 校验必失败。
if [ -n "$resolved" ] && "$resolved" --version >/dev/null 2>&1; then
printf '%s\n' "$resolved"
exit 0
fi
done
exit 1
args:
executable: /bin/sh
register: ai_agent_runtime_browser_resolve
changed_when: false
failed_when: false
check_mode: false
- name: Install AI runtime browser packages when no browser exists
ansible.builtin.apt:
name: "{{ ai_agent_runtime_browser_packages }}"
state: present
update_cache: true
install_recommends: false
# 等 dpkg 前端锁,避免与 cloud-init/unattended-upgrades 抢锁而立即失败
lock_timeout: "{{ ai_workspace_apt_lock_timeout | default(900) | int }}"
environment:
DEBIAN_FRONTEND: noninteractive
APT_LISTCHANGES_FRONTEND: none
become: true
when:
- ai_agent_runtime_browser_resolve.rc != 0
- ansible_os_family != 'Darwin'
- name: Resolve Chromium executable
ansible.builtin.shell: |
set -eu
for candidate in \
"{{ ai_agent_runtime_browser_executable }}" \
/usr/local/bin/chromium \
/usr/local/bin/chromium-browser \
/usr/bin/google-chrome \
/usr/bin/google-chrome-stable \
chromium \
chromium-browser \
google-chrome-stable \
/usr/bin/chromium \
/snap/bin/chromium \
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"; do
resolved=""
if command -v "$candidate" >/dev/null 2>&1; then
resolved="$(command -v "$candidate")"
elif [ -x "$candidate" ]; then
resolved="$candidate"
fi
# 必须真正可执行:跳过 xfce 安装的 disabled chromium stub退出码 126
# 否则 resolver 会选中它,后续 --version 校验必失败。
if [ -n "$resolved" ] && "$resolved" --version >/dev/null 2>&1; then
printf '%s\n' "$resolved"
exit 0
fi
done
exit 1
args:
executable: /bin/sh
register: ai_agent_runtime_browser_resolve
changed_when: false
check_mode: false
- name: Set resolved Chromium executable
ansible.builtin.set_fact:
ai_agent_runtime_browser_resolved_executable: "{{ ai_agent_runtime_browser_resolve.stdout }}"
- name: Ensure AI workspace env directory exists on macOS
ansible.builtin.file:
path: "{{ xworkspace_console_home | default(ansible_env.HOME) }}/.local/state/ai-workspace/env"
state: directory
mode: "0755"
when: ansible_os_family == 'Darwin'
- name: Configure Playwright runtime environment
ansible.builtin.copy:
dest: "{{ '/etc/profile.d' if ansible_os_family != 'Darwin' else (xworkspace_console_home | default(ansible_env.HOME)) + '/.local/state/ai-workspace/env' }}/ai_agent_runtime_playwright.sh"
owner: "{{ 'root' if ansible_os_family != 'Darwin' else (xworkspace_console_user | default(ansible_env.USER)) }}"
group: "{{ 'root' if ansible_os_family != 'Darwin' else ('staff' if ansible_os_family == 'Darwin' else (xworkspace_console_user | default(ansible_env.USER))) }}"
mode: "0644"
content: |
export PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD={{ '1' if ai_agent_runtime_playwright_skip_browser_download | bool else '0' }}
export PLAYWRIGHT_BROWSERS_PATH=0
export PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH={{ ai_agent_runtime_browser_resolved_executable }}
become: "{{ ansible_os_family != 'Darwin' }}"
when: ai_agent_runtime_playwright_enabled | bool

View File

@ -0,0 +1,14 @@
---
- name: Install AI runtime document packages
ansible.builtin.apt:
name: "{{ ai_agent_runtime_doc_packages }}"
state: present
update_cache: true
install_recommends: false
# 等 dpkg 前端锁,避免与 cloud-init/unattended-upgrades 抢锁而立即失败
lock_timeout: "{{ ai_workspace_apt_lock_timeout | default(900) | int }}"
environment:
DEBIAN_FRONTEND: noninteractive
APT_LISTCHANGES_FRONTEND: none
become: true
when: ansible_os_family != 'Darwin'

View File

@ -0,0 +1,14 @@
---
- name: Install AI runtime font packages
ansible.builtin.apt:
name: "{{ ai_agent_runtime_font_packages }}"
state: present
update_cache: true
install_recommends: false
# 等 dpkg 前端锁,避免与 cloud-init/unattended-upgrades 抢锁而立即失败
lock_timeout: "{{ ai_workspace_apt_lock_timeout | default(900) | int }}"
environment:
DEBIAN_FRONTEND: noninteractive
APT_LISTCHANGES_FRONTEND: none
become: true
when: ansible_os_family != 'Darwin'

View File

@ -0,0 +1,52 @@
---
- name: Assert AI agent runtime is supported on Debian or Darwin family
ansible.builtin.assert:
that:
- ansible_facts.os_family in ["Debian", "Darwin", "Windows"]
fail_msg: "roles/ai_agent_runtime currently supports Debian-based, Darwin, and Windows hosts only."
- name: Install AI agent runtime base packages
ansible.builtin.apt:
name: "{{ ai_agent_runtime_base_packages }}"
state: present
update_cache: true
install_recommends: false
# 等 dpkg 前端锁,避免与 cloud-init/unattended-upgrades 抢锁而立即失败
lock_timeout: "{{ ai_workspace_apt_lock_timeout | default(900) | int }}"
environment:
DEBIAN_FRONTEND: noninteractive
APT_LISTCHANGES_FRONTEND: none
become: true
when: ansible_os_family == 'Debian'
- name: Configure Node.js runtime
ansible.builtin.include_tasks: nodejs.yml
when: ai_agent_runtime_nodejs_enabled | bool
- name: Configure Python runtime
ansible.builtin.include_tasks: python.yml
when: ai_agent_runtime_python_enabled | bool
- name: Configure browser runtime
ansible.builtin.include_tasks: browser.yml
when: ai_agent_runtime_browser_enabled | bool
- name: Configure document runtime
ansible.builtin.include_tasks: docs.yml
when: ai_agent_runtime_docs_enabled | bool
- name: Configure font runtime
ansible.builtin.include_tasks: fonts.yml
when: ai_agent_runtime_fonts_enabled | bool
- name: Configure shared agent skills
ansible.builtin.include_role:
name: "{{ ai_agent_runtime_skills_role_name }}"
apply:
tags: agent_skills
when: ai_agent_runtime_skills_enabled | bool
tags: [agent_skills]
- name: Verify AI agent runtime
ansible.builtin.include_tasks: verify.yml
when: ai_agent_runtime_verify_enabled | bool

View File

@ -0,0 +1,49 @@
---
- name: Install Node.js role
ansible.builtin.include_role:
name: roles/vhosts/nodejs
vars:
nodejs_version: "{{ ai_agent_runtime_nodejs_version }}"
install_yarn: "{{ ai_agent_runtime_install_yarn }}"
yarn_version: "{{ ai_agent_runtime_yarn_version }}"
- name: Ensure user local bin directory exists on macOS
ansible.builtin.file:
path: "{{ ansible_env.HOME }}/.local/bin"
state: directory
mode: "0755"
when: ansible_os_family == 'Darwin'
- name: Install npm global package manager helper
ansible.builtin.copy:
src: manage_npm_global_package.sh
dest: "{{ '/usr/local/sbin' if ansible_os_family != 'Darwin' else ansible_env.HOME + '/.local/bin' }}/ai-workspace-manage-npm-global-package"
owner: "{{ 'root' if ansible_os_family != 'Darwin' else xworkspace_console_user }}"
group: "{{ 'root' if ansible_os_family != 'Darwin' else ('staff' if ansible_os_family == 'Darwin' else xworkspace_console_user) }}"
mode: "0755"
become: "{{ ansible_os_family != 'Darwin' }}"
when: ansible_os_family != 'Windows'
- name: Install global npm packages for AI runtime
ansible.builtin.command:
cmd: "{{ '/usr/local/sbin' if ansible_os_family != 'Darwin' else ansible_env.HOME + '/.local/bin' }}/ai-workspace-manage-npm-global-package {{ ai_agent_runtime_npm_global_package_action }} {{ item }}"
loop: "{{ ai_agent_runtime_npm_global_packages }}"
register: ai_agent_runtime_npm_global_install
changed_when: "'changed=1' in ai_agent_runtime_npm_global_install.stdout"
when:
- ai_agent_runtime_npm_global_packages | length > 0
- ansible_os_family != 'Windows'
- name: Install pinned Playwright package for AI runtime
ansible.builtin.command:
cmd: "{{ '/usr/local/sbin' if ansible_os_family != 'Darwin' else ansible_env.HOME + '/.local/bin' }}/ai-workspace-manage-npm-global-package {{ ai_agent_runtime_npm_global_package_action }} playwright@{{ ai_agent_runtime_playwright_version }}"
register: ai_agent_runtime_playwright_install
changed_when: "'changed=1' in ai_agent_runtime_playwright_install.stdout"
when:
- ai_agent_runtime_playwright_enabled | bool
- ai_agent_runtime_playwright_version | length > 0
- ansible_os_family != 'Windows'
- name: Include Windows specific Node.js package tasks
ansible.builtin.include_tasks: windows.yml
when: ansible_os_family == 'Windows'

View File

@ -0,0 +1,7 @@
---
- name: Install Python 3 role
ansible.builtin.include_role:
name: roles/vhosts/python3
vars:
python3_packages: "{{ ai_agent_runtime_python_packages }}"
python3_install_recommends: false

View File

@ -0,0 +1,98 @@
---
- name: Check node version
ansible.builtin.command: node --version
register: ai_agent_runtime_node_version
changed_when: false
check_mode: false
when: ai_agent_runtime_nodejs_enabled | bool
- name: Check npm version
ansible.builtin.command: npm --version
register: ai_agent_runtime_npm_version
changed_when: false
check_mode: false
when: ai_agent_runtime_nodejs_enabled | bool
- name: Check python version
ansible.builtin.command: python3 --version
register: ai_agent_runtime_python_version
changed_when: false
check_mode: false
when: ai_agent_runtime_python_enabled | bool
- name: Check pip version
ansible.builtin.command: pip3 --version
register: ai_agent_runtime_pip_version
changed_when: false
check_mode: false
when: ai_agent_runtime_python_enabled | bool
- name: Check chromium version
ansible.builtin.command:
argv:
- "{{ ai_agent_runtime_browser_resolved_executable | default(ai_agent_runtime_browser_executable) }}"
- "--version"
register: ai_agent_runtime_chromium_version
changed_when: false
check_mode: false
when: ai_agent_runtime_browser_enabled | bool
- name: Check pandoc version
ansible.builtin.command: pandoc --version
register: ai_agent_runtime_pandoc_version
changed_when: false
check_mode: false
when: ai_agent_runtime_docs_enabled | bool
- name: Check xelatex version
ansible.builtin.command: xelatex --version
register: ai_agent_runtime_xelatex_version
changed_when: false
check_mode: false
when: ai_agent_runtime_docs_enabled | bool
- name: Check Chinese font inventory
ansible.builtin.command: fc-list :lang=zh family
register: ai_agent_runtime_chinese_fonts
changed_when: false
check_mode: false
when:
- ai_agent_runtime_fonts_enabled | bool
- ai_agent_runtime_verify_chinese_fonts | bool
- name: Assert Chinese fonts are available
ansible.builtin.assert:
that:
- ai_agent_runtime_chinese_fonts.stdout | length > 0
fail_msg: "No Chinese fonts were discovered by fontconfig."
when:
- ai_agent_runtime_fonts_enabled | bool
- ai_agent_runtime_verify_chinese_fonts | bool
- name: Check agent CLI versions
ansible.builtin.command: "{{ item }} --version"
register: ai_agent_runtime_agent_cli_versions
changed_when: false
failed_when: false
check_mode: false
loop: "{{ ai_agent_runtime_agent_cli_commands }}"
when:
- ai_agent_runtime_nodejs_enabled | bool
- ai_agent_runtime_agent_cli_commands | length > 0
- name: Report AI runtime versions
ansible.builtin.debug:
msg:
node: "{{ ai_agent_runtime_node_version.stdout | default('disabled') }}"
npm: "{{ ai_agent_runtime_npm_version.stdout | default('disabled') }}"
python3: "{{ ai_agent_runtime_python_version.stdout | default('disabled') }}"
pip3: "{{ ai_agent_runtime_pip_version.stdout | default('disabled') }}"
chromium: "{{ ai_agent_runtime_chromium_version.stdout | default('disabled') }}"
pandoc: "{{ (ai_agent_runtime_pandoc_version.stdout_lines | default(['disabled']))[0] }}"
xelatex: "{{ (ai_agent_runtime_xelatex_version.stdout_lines | default(['disabled']))[0] }}"
chinese_font_count: "{{ (ai_agent_runtime_chinese_fonts.stdout_lines | default([])) | length }}"
agent_cli: >-
{{
ai_agent_runtime_agent_cli_versions.results | default([])
| items2dict(key_name='item', value_name='stdout')
}}

View File

@ -0,0 +1,17 @@
---
- name: Install global npm packages for AI runtime on Windows
community.windows.win_command:
cmd: "npm install -g {{ item }}"
loop: "{{ ai_agent_runtime_npm_global_packages }}"
register: ai_agent_runtime_npm_global_install_win
changed_when: "'added' in ai_agent_runtime_npm_global_install_win.stdout or 'updated' in ai_agent_runtime_npm_global_install_win.stdout"
when: ai_agent_runtime_npm_global_packages | length > 0
- name: Install pinned Playwright package for AI runtime on Windows
community.windows.win_command:
cmd: "npm install -g playwright@{{ ai_agent_runtime_playwright_version }}"
register: ai_agent_runtime_playwright_install_win
changed_when: "'added' in ai_agent_runtime_playwright_install_win.stdout or 'updated' in ai_agent_runtime_playwright_install_win.stdout"
when:
- ai_agent_runtime_playwright_enabled | bool
- ai_agent_runtime_playwright_version | length > 0

View File

@ -0,0 +1,15 @@
---
gpu_operator_namespace: "gpu-operator"
gpu_operator_release_name: "gpu-operator"
gpu_operator_chart_version: "v24.3.0"
# Air-gapped / Private registry support
gpu_operator_repository: "https://helm.ngc.nvidia.com/nvidia"
image_pull_secrets: []
# Operator settings
driver_enabled: true
driver_version: "535.129.03"
toolkit_enabled: true
mig_strategy: "single" # none, single, mixed
dcgm_exporter_enabled: true

View File

@ -0,0 +1 @@
---

View File

@ -0,0 +1,28 @@
---
- name: Create GPU Operator namespace
kubernetes.core.k8s:
api_version: v1
kind: Namespace
name: "{{ gpu_operator_namespace }}"
state: present
when: inventory_hostname == groups['masters'][0]
- name: Add NVIDIA helm repo
kubernetes.core.helm_repository:
name: nvidia
repo_url: "{{ gpu_operator_repository }}"
when: inventory_hostname == groups['masters'][0]
- name: Deploy GPU Operator
kubernetes.core.helm:
name: "{{ gpu_operator_release_name }}"
chart_ref: nvidia/gpu-operator
release_namespace: "{{ gpu_operator_namespace }}"
version: "{{ gpu_operator_chart_version }}"
values: "{{ lookup('template', 'values.yaml.j2') | from_yaml }}"
wait: true
when: inventory_hostname == groups['masters'][0]
- name: Include validation tasks
include_tasks: validate.yml
when: inventory_hostname == groups['masters'][0]

View File

@ -0,0 +1,15 @@
---
- name: Wait for NVIDIA Device Plugin daemonset to be ready
shell: |
kubectl rollout status daemonset/nvidia-device-plugin-daemonset -n {{ gpu_operator_namespace }} --timeout=300s
register: ds_status
changed_when: false
- name: Validate GPU resources are allocatable
shell: |
kubectl get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[*].status.allocatable}'
register: gpu_allocatable
until: "'nvidia.com/gpu' in gpu_allocatable.stdout"
retries: 30
delay: 20
changed_when: false

View File

@ -0,0 +1,15 @@
driver:
enabled: {{ driver_enabled }}
version: "{{ driver_version }}"
toolkit:
enabled: {{ toolkit_enabled }}
devicePlugin:
enabled: true
mig:
strategy: "{{ mig_strategy }}"
dcgmExporter:
enabled: {{ dcgm_exporter_enabled }}
{% if image_pull_secrets | length > 0 %}
imagePullSecrets:
{{ image_pull_secrets | to_nice_yaml(indent=2) | indent(2, true) }}
{% endif %}

View File

@ -0,0 +1,36 @@
---
ray_namespace: "ray-system"
ray_cluster_name: "ray-cluster"
ray_image: "rayproject/ray:2.9.0"
ray_version: "2.9.0"
ray_dashboard_enabled: true
ray_head_resources:
requests:
cpu: "2"
memory: "8Gi"
limits:
cpu: "4"
memory: "16Gi"
ray_worker_groups:
- groupName: gpu-workers
replicas: 2
minReplicas: 1
maxReplicas: 4
resources:
requests:
cpu: "4"
memory: "32Gi"
nvidia.com/gpu: "1"
limits:
cpu: "8"
memory: "64Gi"
nvidia.com/gpu: "1"
nodeSelector:
accelerator: "nvidia-h100"
tolerations: []
volumeMounts:
- mountPath: /dev/shm
name: dshm

View File

@ -0,0 +1 @@
---

View File

@ -0,0 +1,24 @@
---
- name: Create Ray namespace
kubernetes.core.k8s:
name: "{{ ray_namespace }}"
api_version: v1
kind: Namespace
state: present
when: inventory_hostname == groups['masters'][0]
- name: Apply RayCluster CRD
kubernetes.core.k8s:
state: present
definition: "{{ lookup('template', 'raycluster.yaml.j2') | from_yaml }}"
when: inventory_hostname == groups['masters'][0]
- name: Wait for Ray head node to be ready
shell: |
kubectl get pod -n {{ ray_namespace }} -l ray.io/node-type=head -o jsonpath='{.items[0].status.phase}'
register: head_status
until: head_status.stdout == "Running"
retries: 30
delay: 10
changed_when: false
when: inventory_hostname == groups['masters'][0]

View File

@ -0,0 +1,53 @@
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: {{ ray_cluster_name }}
namespace: {{ ray_namespace }}
spec:
rayVersion: '{{ ray_version }}'
headGroupSpec:
rayStartParams:
dashboard-host: '0.0.0.0'
{% if not ray_dashboard_enabled %}
dashboard-enabled: 'false'
{% endif %}
template:
spec:
containers:
- name: ray-head
image: {{ ray_image }}
resources:
{{ ray_head_resources | to_nice_yaml(indent=4) | indent(12, true) }}
workerGroupSpecs:
{% for group in ray_worker_groups %}
- groupName: {{ group.groupName }}
replicas: {{ group.replicas }}
minReplicas: {{ group.minReplicas }}
maxReplicas: {{ group.maxReplicas }}
rayStartParams: {}
template:
spec:
{% if group.nodeSelector is defined %}
nodeSelector:
{{ group.nodeSelector | to_nice_yaml(indent=2) | indent(10, true) }}
{% endif %}
{% if group.tolerations is defined and group.tolerations | length > 0 %}
tolerations:
{{ group.tolerations | to_nice_yaml(indent=2) | indent(10, true) }}
{% endif %}
containers:
- name: ray-worker
image: {{ ray_image }}
resources:
{{ group.resources | to_nice_yaml(indent=4) | indent(12, true) }}
{% if group.volumeMounts is defined and group.volumeMounts | length > 0 %}
volumeMounts:
{{ group.volumeMounts | to_nice_yaml(indent=2) | indent(10, true) }}
{% endif %}
{% if group.volumeMounts is defined and group.volumeMounts | selectattr('name', 'equalto', 'dshm') | list | length > 0 %}
volumes:
- name: dshm
emptyDir:
medium: Memory
{% endif %}
{% endfor %}

View File

@ -0,0 +1 @@
---

View File

@ -0,0 +1 @@
---

View File

@ -0,0 +1,31 @@
---
vllm_namespace: "vllm-system"
vllm_service_name: "vllm-api"
vllm_image: "vllm/vllm-openai:v0.4.2"
vllm_model: "/models/Llama-3-70B-Instruct"
vllm_tensor_parallel_size: 2
vllm_pipeline_parallel_size: 1
vllm_gpu_memory_utilization: 0.90
vllm_max_model_len: 4096
vllm_max_num_seqs: 256
vllm_port: 8000
vllm_service_type: "ClusterIP"
vllm_ingress_enabled: false
vllm_ingress_host: "vllm.example.com"
# Ray Integration
ray_address: "ray://ray-cluster-head-svc.ray-system.svc.cluster.local:10001"
# Environment Variables
nccl_socket_ifname: "eth0"
gloo_socket_ifname: "eth0"
nccl_ib_disable: "1"
vllm_logging_level: "INFO"
torch_distributed_init_timeout: "300"
huggingface_token: ""
# Model Mount
model_host_path: "/data/models"
model_mount_path: "/models"

View File

@ -0,0 +1 @@
---

View File

@ -0,0 +1,36 @@
---
- name: Create vLLM namespace
kubernetes.core.k8s:
name: "{{ vllm_namespace }}"
api_version: v1
kind: Namespace
state: present
when: inventory_hostname == groups['masters'][0]
- name: Deploy vLLM Deployment
kubernetes.core.k8s:
state: present
definition: "{{ lookup('template', 'deployment.yaml.j2') | from_yaml }}"
when: inventory_hostname == groups['masters'][0]
- name: Deploy vLLM Service
kubernetes.core.k8s:
state: present
definition: "{{ lookup('template', 'service.yaml.j2') | from_yaml }}"
when: inventory_hostname == groups['masters'][0]
- name: Deploy vLLM Ingress
kubernetes.core.k8s:
state: present
definition: "{{ lookup('template', 'ingress.yaml.j2') | from_yaml }}"
when: inventory_hostname == groups['masters'][0] and vllm_ingress_enabled
- name: Wait for vLLM API to be ready
shell: |
kubectl get deploy {{ vllm_service_name }} -n {{ vllm_namespace }} -o jsonpath='{.status.readyReplicas}'
register: vllm_ready
until: vllm_ready.stdout == "1"
retries: 40
delay: 15
changed_when: false
when: inventory_hostname == groups['masters'][0]

View File

@ -0,0 +1,65 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ vllm_service_name }}
namespace: {{ vllm_namespace }}
spec:
replicas: 1
selector:
matchLabels:
app: {{ vllm_service_name }}
template:
metadata:
labels:
app: {{ vllm_service_name }}
spec:
containers:
- name: vllm
image: {{ vllm_image }}
command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]
args:
- "--model={{ vllm_model }}"
- "--tensor-parallel-size={{ vllm_tensor_parallel_size }}"
- "--pipeline-parallel-size={{ vllm_pipeline_parallel_size }}"
- "--gpu-memory-utilization={{ vllm_gpu_memory_utilization }}"
- "--max-model-len={{ vllm_max_model_len }}"
- "--max-num-seqs={{ vllm_max_num_seqs }}"
- "--port={{ vllm_port }}"
- "--distributed-executor-backend=ray"
- "--worker-use-ray"
env:
- name: RAY_ADDRESS
value: "{{ ray_address }}"
- name: NCCL_SOCKET_IFNAME
value: "{{ nccl_socket_ifname }}"
- name: GLOO_SOCKET_IFNAME
value: "{{ gloo_socket_ifname }}"
- name: NCCL_IB_DISABLE
value: "{{ nccl_ib_disable }}"
- name: VLLM_LOGGING_LEVEL
value: "{{ vllm_logging_level }}"
- name: TORCH_DISTRIBUTED_INIT_TIMEOUT
value: "{{ torch_distributed_init_timeout }}"
- name: HUGGING_FACE_HUB_TOKEN
value: "{{ huggingface_token }}"
ports:
- containerPort: {{ vllm_port }}
readinessProbe:
httpGet:
path: /health
port: {{ vllm_port }}
initialDelaySeconds: 30
periodSeconds: 10
volumeMounts:
- name: dshm
mountPath: /dev/shm
- name: models
mountPath: {{ model_mount_path }}
volumes:
- name: dshm
emptyDir:
medium: Memory
- name: models
hostPath:
path: {{ model_host_path }}
type: DirectoryOrCreate

View File

@ -0,0 +1,17 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ vllm_service_name }}-ingress
namespace: {{ vllm_namespace }}
spec:
rules:
- host: {{ vllm_ingress_host }}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: {{ vllm_service_name }}
port:
number: {{ vllm_port }}

View File

@ -0,0 +1,51 @@
apiVersion: ray.io/v1
kind: RayService
metadata:
name: {{ service_name }}
namespace: default
spec:
serveConfigV2: |
applications:
- name: vllm_app
import_path: "vllm.engine.arg_utils:AsyncEngineArgs"
route_prefix: /
rayClusterConfig:
rayVersion: {{ kuberay_version }}
headGroupSpec:
rayStartParams:
dashboard-host: '0.0.0.0'
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:{{ kuberay_version }}
workerGroupSpecs:
- replicas: {{ worker_replicas }}
minReplicas: 1
maxReplicas: {{ max_replicas }}
groupName: gpu-group
rayStartParams: {}
template:
spec:
containers:
- name: vllm-node
image: {{ vllm_image }}
resources:
limits:
nvidia.com/gpu: {{ gpus_per_worker }}
env:
- name: HUGGING_FACE_HUB_TOKEN
value: "{{ huggingface_token }}"
command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]
args:
- "--model"
- "{{ model_name_or_path }}"
- "--tensor-parallel-size"
- "{{ tensor_parallel_size }}"
volumeMounts:
- name: dshm
mountPath: /dev/shm
volumes:
- name: dshm
emptyDir:
medium: Memory

View File

@ -0,0 +1,14 @@
apiVersion: v1
kind: Service
metadata:
name: {{ vllm_service_name }}
namespace: {{ vllm_namespace }}
spec:
type: {{ vllm_service_type }}
ports:
- port: {{ vllm_port }}
targetPort: {{ vllm_port }}
protocol: TCP
name: http
selector:
app: {{ vllm_service_name }}

View File

@ -85,6 +85,7 @@
when:
- cloud_cli_prereqs_install_gcloud_cli | bool
- not ansible_check_mode
become: true
- name: Verify Azure CLI on Linux
ansible.builtin.command: az version

View File

@ -6,10 +6,10 @@ Reusable Ansible role for creating and updating Cloudflare DNS records in the `s
- Zone lookup by name, or direct `cloudflare_dns_zone_id`
- Create/update/delete of managed DNS records
- Token resolution from Ansible extra vars:
- Token resolution from Ansible extra vars, with the DNS-scoped token preferred:
- `-e CLOUDFLARE_DNS_API_TOKEN=...`
- `-e CLOUDFLARE_API_TOKEN=...`
- Environment-backed token resolution as fallback:
- Environment-backed token resolution as fallback, with the DNS-scoped token preferred:
- `CLOUDFLARE_DNS_API_TOKEN`
- `CLOUDFLARE_API_TOKEN`

View File

@ -78,7 +78,7 @@
- "'#zone:read' in (cloudflare_dns_zone_lookup.json.result[0].permissions | default([]))"
- "'#dns_records:edit' in (cloudflare_dns_zone_lookup.json.result[0].permissions | default([]))"
fail_msg: >-
CLOUDFLARE_API_TOKEN is valid but lacks DNS edit permission for {{ cloudflare_dns_zone_name }}.
CLOUDFLARE_DNS_API_TOKEN is valid but lacks DNS edit permission for {{ cloudflare_dns_zone_name }}.
Current permissions: {{ cloudflare_dns_zone_lookup.json.result[0].permissions | default([]) }}.
Required: Zone read + DNS edit on the svc.plus zone.
when:

View File

@ -8,6 +8,7 @@
cloudflare_dns_records: >-
{%- set records = [] -%}
{%- set source_specs = cloudflare_dns_source_hosts | default(cloudflare_dns_default_source_hosts, true) -%}
{%- set static_records = cloudflare_dns_static_records | default([], true) -%}
{%- set expanded_hosts = [] -%}
{%- for spec in source_specs -%}
{%- for host in query('inventory_hostnames', spec) -%}
@ -29,6 +30,9 @@
}) -%}
{%- endfor -%}
{%- endfor -%}
{%- for static_record in static_records -%}
{%- set _ = records.append(static_record) -%}
{%- endfor -%}
{{ records | to_json | from_yaml }}
- name: Reconcile svc.plus DNS via shared Cloudflare role

View File

@ -0,0 +1 @@
---

View File

@ -0,0 +1,16 @@
---
postgresql_image: "ghcr.io/x-evor/images/postgresql:17"
postgresql_compose_project_dir: "{{ '/opt/ai-workspace/postgres' if ansible_os_family != 'Darwin' else lookup('env', 'HOME') + '/ai-workspace-postgres' }}"
postgresql_compose_project_name: "ai-workspace-postgres"
postgresql_compose_file: "{{ postgresql_compose_project_dir }}/docker-compose.yml"
postgresql_compose_env_file: "{{ postgresql_compose_project_dir }}/.env"
postgresql_data_dir: "{{ postgresql_compose_project_dir }}/data"
postgresql_admin_user: postgres
postgresql_admin_password: "changeme"
postgresql_database: postgres
postgresql_port: 5432
postgresql_local_port: 15432
postgresql_container_uid: "999"
postgresql_container_gid: "999"

View File

@ -0,0 +1,100 @@
-- PostgreSQL initialization script
-- This script runs automatically on first container startup
-- Create extensions
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_jieba;
CREATE EXTENSION IF NOT EXISTS pgmq;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS hstore;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
-- Create a sample database for testing
CREATE DATABASE appdb;
-- Connect to the new database
\c appdb
-- Recreate extensions in the new database
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_jieba;
CREATE EXTENSION IF NOT EXISTS pgmq;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS hstore;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
-- Create a sample schema
CREATE SCHEMA IF NOT EXISTS app;
-- Sample table with vector embeddings
CREATE TABLE IF NOT EXISTS app.documents (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536), -- OpenAI ada-002 dimension
metadata JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Create indexes
CREATE INDEX IF NOT EXISTS idx_documents_embedding ON app.documents
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
CREATE INDEX IF NOT EXISTS idx_documents_metadata ON app.documents
USING gin (metadata);
CREATE INDEX IF NOT EXISTS idx_documents_content ON app.documents
USING gin (to_tsvector('english', content));
-- Sample table for node management
CREATE TABLE IF NOT EXISTS app.nodes (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
name TEXT NOT NULL,
location TEXT NOT NULL,
address TEXT NOT NULL,
port INTEGER NOT NULL DEFAULT 443,
server_name TEXT,
protocols JSONB NOT NULL DEFAULT '[]'::jsonb,
available BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Index for available nodes
CREATE INDEX IF NOT EXISTS idx_nodes_available ON app.nodes (available);
-- Sample table with Chinese full-text search
CREATE TABLE IF NOT EXISTS app.articles_zh (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
title TEXT NOT NULL,
content TEXT NOT NULL,
tags TEXT[],
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX IF NOT EXISTS idx_articles_zh_content ON app.articles_zh
USING gin (to_tsvector('jiebacfg', content));
-- Sample key-value store using hstore
CREATE TABLE IF NOT EXISTS app.sessions (
session_id TEXT PRIMARY KEY,
data hstore NOT NULL,
expires_at TIMESTAMP NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_sessions_expires ON app.sessions (expires_at);
-- Create a message queue
SELECT pgmq.create('task_queue');
SELECT pgmq.create('notification_queue');
-- Grant permissions (adjust as needed)
-- GRANT ALL PRIVILEGES ON SCHEMA app TO your_app_user;
-- GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA app TO your_app_user;
COMMENT ON DATABASE appdb IS 'Application database with vector search, full-text search, and message queue capabilities';
COMMENT ON SCHEMA app IS 'Main application schema';
COMMENT ON TABLE app.documents IS 'Documents with vector embeddings for semantic search';
COMMENT ON TABLE app.articles_zh IS 'Chinese articles with jieba tokenization';
COMMENT ON TABLE app.sessions IS 'Session storage using hstore';

View File

@ -0,0 +1,88 @@
# PostgreSQL Configuration
# Optimized for application workloads with vector search and full-text search
# Connection Settings
listen_addresses = '*'
port = 5432
max_connections = 100
superuser_reserved_connections = 3
# Memory Settings
shared_buffers = 256MB
effective_cache_size = 1GB
maintenance_work_mem = 64MB
work_mem = 16MB
# Write-Ahead Log
wal_buffers = 16MB
min_wal_size = 1GB
max_wal_size = 4GB
checkpoint_completion_target = 0.9
wal_compression = on
# Query Tuning
random_page_cost = 1.1 # Lower for SSD
effective_io_concurrency = 200
default_statistics_target = 100
# Logging
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 100MB
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_timezone = 'UTC'
# What to Log
log_checkpoints = on
log_connections = on
log_disconnections = on
log_duration = off
log_lock_waits = on
log_statement = 'none'
log_temp_files = 0
# Slow Query Logging
log_min_duration_statement = 1000 # Log queries slower than 1 second
# Locale and Formatting
datestyle = 'iso, mdy'
timezone = 'UTC'
lc_messages = 'en_US.utf8'
lc_monetary = 'en_US.utf8'
lc_numeric = 'en_US.utf8'
lc_time = 'en_US.utf8'
default_text_search_config = 'pg_catalog.english'
# Extension-specific settings
# pgvector settings
# No specific configuration needed, but ensure shared_buffers is adequate
# pg_jieba settings
# Default configuration is usually sufficient
# Full-text search
# Increase work_mem if doing complex text searches
# work_mem = 32MB # Uncomment if needed
# Performance for JSONB
# GIN indexes benefit from larger maintenance_work_mem during creation
# Connection Pooling (if using PgBouncer)
# Consider lowering max_connections and using PgBouncer
# Security
# ssl = on
# ssl_cert_file = '/path/to/server.crt'
# ssl_key_file = '/path/to/server.key'
# ssl_ca_file = '/path/to/ca.crt'
# Uncomment for production SSL/TLS
# ssl_prefer_server_ciphers = on
# ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
# Client Authentication
# Edit pg_hba.conf for detailed access control

View File

@ -1,5 +1,93 @@
---
# TODO: implement docker deployment tasks
- name: Placeholder task
debug:
msg: "Role placeholder. Implement docker deployment tasks."
- name: Ensure Homebrew Docker and Colima are installed (macOS)
ansible.builtin.command: brew install colima docker docker-compose
environment:
PATH: "/opt/homebrew/bin:/usr/local/bin:{{ ansible_env.PATH | default('/usr/bin:/bin') }}"
HOMEBREW_NO_AUTO_UPDATE: "1"
register: brew_install
changed_when: >-
'already installed' not in (brew_install.stderr | default(''))
and 'already installed' not in (brew_install.stdout | default(''))
when: ansible_os_family == 'Darwin'
- name: Ensure Colima is started (macOS)
ansible.builtin.command: colima start
environment:
PATH: "/opt/homebrew/bin:/usr/local/bin:{{ ansible_env.PATH | default('/usr/bin:/bin') }}"
register: colima_start
changed_when: "'already running' not in colima_start.stdout and 'already running' not in colima_start.stderr"
when: ansible_os_family == 'Darwin'
- name: Ensure PostgreSQL compose project directory exists
ansible.builtin.file:
path: "{{ postgresql_compose_project_dir }}"
state: directory
mode: "0755"
- name: Ensure PostgreSQL init-scripts directory exists
ansible.builtin.file:
path: "{{ postgresql_compose_project_dir }}/init-scripts"
state: directory
mode: "0755"
- name: Ensure PostgreSQL data directory exists
ansible.builtin.file:
path: "{{ postgresql_data_dir }}"
state: directory
mode: "0700"
owner: "{{ postgresql_container_uid }}"
group: "{{ postgresql_container_gid }}"
# macOS/Colima usually handles volume mounts with current user, but uid 999 is standard for postgres container
ignore_errors: "{{ ansible_os_family == 'Darwin' }}"
- name: Render PostgreSQL compose environment file
ansible.builtin.copy:
dest: "{{ postgresql_compose_env_file }}"
mode: "0600"
content: |
POSTGRES_DB={{ postgresql_database }}
POSTGRES_USER={{ postgresql_admin_user }}
POSTGRES_PASSWORD={{ postgresql_admin_password }}
no_log: true
- name: Render PostgreSQL compose file
ansible.builtin.template:
src: docker-compose.yml.j2
dest: "{{ postgresql_compose_file }}"
mode: "0644"
- name: Copy postgresql.conf
ansible.builtin.copy:
src: postgresql.conf
dest: "{{ postgresql_compose_project_dir }}/postgresql.conf"
mode: "0644"
- name: Copy init-extensions script
ansible.builtin.copy:
src: init-scripts/01-init-extensions.sql
dest: "{{ postgresql_compose_project_dir }}/init-scripts/01-init-extensions.sql"
mode: "0644"
- name: Start PostgreSQL compose service
ansible.builtin.command:
cmd: "docker compose -f {{ postgresql_compose_file }} -p {{ postgresql_compose_project_name }} up -d --remove-orphans"
chdir: "{{ postgresql_compose_project_dir }}"
register: postgresql_compose_up
changed_when: >-
'Started' in (postgresql_compose_up.stdout | default('')) or
'Created' in (postgresql_compose_up.stdout | default('')) or
'Recreated' in (postgresql_compose_up.stdout | default('')) or
'Pulled' in (postgresql_compose_up.stdout | default(''))
environment:
PATH: "/opt/homebrew/bin:/usr/local/bin:{{ ansible_env.PATH | default('/usr/bin:/bin') }}"
- name: Validate PostgreSQL compose service
ansible.builtin.command:
cmd: "docker exec {{ postgresql_compose_project_name }} pg_isready -U {{ postgresql_admin_user }} -d {{ postgresql_database }}"
register: postgresql_compose_ready
retries: 12
delay: 5
until: postgresql_compose_ready.rc == 0
changed_when: false
environment:
PATH: "/opt/homebrew/bin:/usr/local/bin:{{ ansible_env.PATH | default('/usr/bin:/bin') }}"

View File

@ -0,0 +1,45 @@
services:
postgres:
image: "{{ postgresql_image }}"
container_name: "{{ postgresql_compose_project_name }}"
restart: unless-stopped
env_file:
- "{{ postgresql_compose_env_file }}"
# PostgreSQL 只监听 localhost,通过 stunnel 提供外部访问
# 不直接暴露端口,确保所有连接都经过 TLS 加密
expose:
- "5432"
ports:
- "{{ postgresql_local_port }}:5432"
- "{{ postgresql_port }}:5432"
volumes:
- "{{ postgresql_data_dir }}:/var/lib/postgresql/data"
- ./init-scripts:/docker-entrypoint-initdb.d:ro
- ./postgresql.conf:/etc/postgresql/postgresql.conf:ro
healthcheck:
test: [ "CMD-SHELL", "pg_isready -U {{ postgresql_admin_user }} -h 127.0.0.1" ]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
networks:
- postgres_network
# Resource limits (adjust based on your needs)
deploy:
resources:
limits:
cpus: '2'
memory: 2G
reservations:
cpus: '1'
memory: 1G
networks:
postgres_network:
driver: bridge

View File

@ -8,6 +8,7 @@ zitadel_api_bind_host: 127.0.0.1
zitadel_api_port: 19080
zitadel_login_bind_host: 127.0.0.1
zitadel_login_port: 19081
zitadel_caddyfile_path: /etc/caddy/Caddyfile
zitadel_caddy_conf_dir: /etc/caddy/conf.d
zitadel_caddy_fragment_path: /etc/caddy/conf.d/zitadel.caddy
zitadel_caddy_base_dir: "{{ '/opt/homebrew/etc/caddy' if ansible_os_family == 'Darwin' else '/etc/caddy' }}"
zitadel_caddyfile_path: "{{ zitadel_caddy_base_dir }}/Caddyfile"
zitadel_caddy_conf_dir: "{{ zitadel_caddy_base_dir }}/conf.d"
zitadel_caddy_fragment_path: "{{ zitadel_caddy_base_dir }}/conf.d/zitadel.caddy"

View File

@ -5,8 +5,8 @@ This role manages GitHub Organization Rulesets to enforce branch protection and
## Governance Rules
### 1. Global Main Protection
- **Target:** `main` branch
- **Inclusion:** All repositories (`~ALL`)
- **Target:** `{{ github_target_branch }}` branch
- **Inclusion:** `{{ github_repository_name }}`
- **Rules:**
- Prevent deletion.
- Prevent force pushes (non-fast-forward).
@ -14,8 +14,8 @@ This role manages GitHub Organization Rulesets to enforce branch protection and
- Dismiss stale reviews on push.
### 2. Global Release Protection
- **Target:** `release/*` branches
- **Inclusion:** All repositories (`~ALL`)
- **Target:** `{{ github_release_branch_pattern }}` branches
- **Inclusion:** `{{ github_repository_name }}`
- **Rules:**
- Prevent deletion.
- Prevent force pushes.
@ -37,4 +37,22 @@ ansible-playbook apply-branch-protection.yml
## Configuration
- `github_org_name`: Defined in `defaults/main.yml`.
- `github_repository_name`: Optional repository scope. Defaults to `~ALL`.
- `github_target_branch`: Main branch target. Defaults to `main`.
- `github_release_branch_pattern`: Release branch pattern. Defaults to `release/*`.
- `github_rulesets`: Defined in `vars/main.yml`.
## Common usage
Target one repository and one release branch:
```bash
export GITHUB_TOKEN=your_admin_token
ansible-playbook apply-branch-protection.yml \
-e github_org_name=cloud-neutral \
-e github_repository_name=xstream-vpn \
-e github_target_branch=main \
-e github_release_branch_pattern=release/http3-quic-stable
```
If you want the rule to apply to all repositories in the organization, keep the default `github_repository_name=~ALL`.

View File

@ -1,4 +1,7 @@
---
github_org_name: "cloud-neutral"
owner: "{{ github_org_name }}"
repo: ""
repo: ""
github_repository_name: "~ALL"
github_target_branch: "main"
github_release_branch_pattern: "release/*"

View File

@ -6,11 +6,11 @@ github_rulesets:
conditions:
ref_name:
include:
- "refs/heads/main"
- "refs/heads/{{ github_target_branch }}"
exclude: []
repository_name:
include:
- "~ALL"
- "{{ github_repository_name }}"
exclude: []
protected: false
rules:
@ -30,11 +30,11 @@ github_rulesets:
conditions:
ref_name:
include:
- "refs/heads/release/*"
- "refs/heads/{{ github_release_branch_pattern }}"
exclude: []
repository_name:
include:
- "~ALL"
- "{{ github_repository_name }}"
exclude: []
protected: false
rules:
@ -47,4 +47,4 @@ github_rulesets:
dismiss_stale_reviews_on_push: true
require_code_owner_reviews: false
require_last_push_approval: false
required_review_thread_resolution: false
required_review_thread_resolution: false

View File

@ -1,3 +1,10 @@
- name: Update apt cache
apt:
update_cache: true
cache_valid_time: 3600
# 检查模式下不更新缓存,仅检查 package
when: not ansible_check_mode
- name: Install prerequisites for OpenResty
apt:
name:
@ -5,7 +12,6 @@
- gnupg
- apt-transport-https
state: present
update_cache: true
- name: Import OpenResty GPG key
shell: |
@ -26,7 +32,6 @@
apt:
name: openresty
state: present
update_cache: true
- name: Ensure sites-available directory exists
file:
@ -70,6 +75,7 @@
name: openresty
enabled: true
state: started
# systemd 模块在 -C 模式下安全检查,不会实际启停
- name: Verify OpenResty core API
shell: |
@ -80,3 +86,5 @@
retries: 5
delay: 3
until: openresty_verify.rc == 0
# 检查模式不重启服务,跳过验证避免误报
when: not ansible_check_mode

View File

@ -2,6 +2,9 @@
accounts_service_image_ref: "{{ accounts_service_image_repo }}:{{ accounts_service_image_tag }}"
accounts_service_image_repo: ghcr.io/x-evor/accounts
accounts_service_image_tag: latest
accounts_service_registry_server: "{{ lookup('ansible.builtin.env', 'GHCR_REGISTRY') | default('ghcr.io', true) }}"
accounts_service_registry_username: "{{ lookup('ansible.builtin.env', 'GHCR_USERNAME') | default('', true) }}"
accounts_service_registry_password: "{{ lookup('ansible.builtin.env', 'GHCR_PASSWORD') | default(lookup('ansible.builtin.env', 'GHCR_TOKEN') | default('', true), true) }}"
accounts_service_pull_image: true
accounts_service_container_port: 8080
accounts_service_base_dir: /opt/cloud-neutral/accounts/managed
@ -9,9 +12,10 @@ accounts_service_shared_network: cn-toolkit-shared
accounts_service_dns_servers:
- 1.1.1.1
- 8.8.8.8
accounts_service_caddyfile_path: /etc/caddy/Caddyfile
accounts_service_caddy_conf_dir: /etc/caddy/conf.d
accounts_service_caddy_fragment_path: /etc/caddy/conf.d/accounts-contabo-e700175.caddy
accounts_service_caddy_base_dir: "{{ '/opt/homebrew/etc/caddy' if ansible_os_family == 'Darwin' else '/etc/caddy' }}"
accounts_service_caddyfile_path: "{{ accounts_service_caddy_base_dir }}/Caddyfile"
accounts_service_caddy_conf_dir: "{{ accounts_service_caddy_base_dir }}/conf.d"
accounts_service_caddy_fragment_path: "{{ accounts_service_caddy_base_dir }}/conf.d/accounts-contabo-e700175.caddy"
accounts_service_caddy_sites:
- server_names:
- accounts.svc.plus
@ -24,6 +28,9 @@ accounts_service_env_defaults:
DB_PASSWORD: ""
DB_PORT: "15432"
DB_USER: svcplus_vps
BRIDGE_AUTH_TOKEN: "{{ lookup('ansible.builtin.env', 'BRIDGE_AUTH_TOKEN') | default('', true) }}"
BRIDGE_REVIEW_AUTH_TOKEN: "{{ lookup('ansible.builtin.env', 'BRIDGE_REVIEW_AUTH_TOKEN') | default('', true) }}"
BRIDGE_SERVER_URL: "{{ lookup('ansible.builtin.env', 'BRIDGE_SERVER_URL') | default('https://xworkmate-bridge.svc.plus', true) }}"
INTERNAL_SERVICE_TOKEN: ""
POSTGRES_PASSWORD: ""
POSTGRES_USER: svcplus_vps

View File

@ -14,6 +14,7 @@
owner: root
group: root
mode: "0755"
become: true
- name: Deploy managed accounts Caddy fragment
ansible.builtin.template:
@ -23,12 +24,25 @@
group: root
mode: "0644"
notify: Reload caddy
become: true
- name: Ensure Caddy is enabled and running for accounts service
ansible.builtin.systemd:
name: caddy
enabled: true
state: started
become: true
- name: Log into container registry for accounts service
ansible.builtin.shell: |
set -euo pipefail
printf '%s' '{{ accounts_service_registry_password }}' | docker login {{ accounts_service_registry_server }} -u '{{ accounts_service_registry_username }}' --password-stdin
args:
executable: /bin/bash
no_log: true
when:
- accounts_service_registry_username | length > 0
- accounts_service_registry_password | length > 0
- name: Ensure shared Docker network exists for accounts service
ansible.builtin.command: docker network inspect "{{ accounts_service_shared_network }}"

View File

@ -68,6 +68,22 @@
state: present
insertafter: '^CONFIG_TEMPLATE='
- name: Ensure managed xworkmate bridge env is present for {{ accounts_service_target.name }}
ansible.builtin.lineinfile:
path: "{{ accounts_service_target.env_file }}"
regexp: "^{{ item.key }}="
line: "{{ item.key }}={{ item.value }}"
state: present
insertafter: '^IMAGE='
loop:
- key: BRIDGE_AUTH_TOKEN
value: "{{ accounts_service_env_defaults.BRIDGE_AUTH_TOKEN }}"
- key: BRIDGE_REVIEW_AUTH_TOKEN
value: "{{ accounts_service_env_defaults.BRIDGE_REVIEW_AUTH_TOKEN }}"
- key: BRIDGE_SERVER_URL
value: "{{ accounts_service_env_defaults.BRIDGE_SERVER_URL }}"
no_log: true
- name: Render managed account config for {{ accounts_service_target.name }}
ansible.builtin.template:
src: account.yaml.j2

View File

@ -7,6 +7,9 @@ DB_NAME={{ accounts_service_env_defaults.DB_NAME }}
DB_PASSWORD={{ accounts_service_env_defaults.DB_PASSWORD }}
DB_PORT={{ accounts_service_env_defaults.DB_PORT }}
DB_USER={{ accounts_service_env_defaults.DB_USER }}
BRIDGE_AUTH_TOKEN={{ accounts_service_env_defaults.BRIDGE_AUTH_TOKEN }}
BRIDGE_REVIEW_AUTH_TOKEN={{ accounts_service_env_defaults.BRIDGE_REVIEW_AUTH_TOKEN }}
BRIDGE_SERVER_URL={{ accounts_service_env_defaults.BRIDGE_SERVER_URL }}
INTERNAL_SERVICE_TOKEN={{ accounts_service_env_defaults.INTERNAL_SERVICE_TOKEN }}
POSTGRES_PASSWORD={{ accounts_service_env_defaults.POSTGRES_PASSWORD }}
POSTGRES_USER={{ accounts_service_env_defaults.POSTGRES_USER }}

View File

@ -1,27 +0,0 @@
---
acp_bridge_server_service_name: xworkmate-acp-bridge-server
acp_bridge_server_service_user: root
acp_bridge_server_service_group: root
acp_bridge_server_workdir: /root
acp_bridge_server_listen_host: 127.0.0.1
acp_bridge_server_listen_port: 8787
acp_bridge_server_binary_path: /usr/local/bin/xworkmate-acp-bridge-server
acp_bridge_server_local_source_dir: "{{ playbook_dir }}/../xworkmate/go/go_core"
acp_bridge_server_local_build_dir: "{{ playbook_dir }}/.artifacts/acp_bridge_server"
acp_bridge_server_local_binary_path: "{{ acp_bridge_server_local_build_dir }}/xworkmate-acp-bridge-server"
acp_bridge_server_build_goos: linux
acp_bridge_server_build_goarch: amd64
acp_bridge_server_domain: acp-server.svc.plus
acp_bridge_server_public_base_url: https://acp-server.svc.plus
acp_bridge_server_caddyfile_path: /etc/caddy/Caddyfile
acp_bridge_server_caddy_conf_dir: /etc/caddy/conf.d
acp_bridge_server_caddy_fragment_path: /etc/caddy/conf.d/acp-server-bridge.caddy
acp_bridge_server_obsolete_caddy_fragment_paths:
- /etc/caddy/conf.d/acp-server-bridge-server.caddy
acp_bridge_server_allowed_origins:
- https://xworkmate.svc.plus
- http://localhost:*
- http://127.0.0.1:*
acp_bridge_server_enable_ufw: true
acp_bridge_server_packages:
- caddy

View File

@ -1,65 +0,0 @@
---
- name: Install ACP bridge server prerequisites
ansible.builtin.package:
name: "{{ acp_bridge_server_packages }}"
state: present
- name: Ensure local ACP bridge build directory exists
ansible.builtin.file:
path: "{{ acp_bridge_server_local_build_dir }}"
state: directory
mode: "0755"
delegate_to: localhost
become: false
- name: Build XWorkmate ACP bridge server locally
ansible.builtin.command:
cmd: go build -o "{{ acp_bridge_server_local_binary_path }}" .
chdir: "{{ acp_bridge_server_local_source_dir }}"
environment:
GOOS: "{{ acp_bridge_server_build_goos }}"
GOARCH: "{{ acp_bridge_server_build_goarch }}"
CGO_ENABLED: "0"
delegate_to: localhost
become: false
- name: Upload XWorkmate ACP bridge server binary
ansible.builtin.copy:
src: "{{ acp_bridge_server_local_binary_path }}"
dest: "{{ acp_bridge_server_binary_path }}"
owner: root
group: root
mode: "0755"
notify: Restart acp bridge server
- name: Deploy ACP bridge Caddy fragment
ansible.builtin.template:
src: acp-bridge-server.caddy.j2
dest: "{{ acp_bridge_server_caddy_fragment_path }}"
owner: root
group: root
mode: "0644"
notify: Reload caddy
- name: Remove deprecated standalone ACP bridge Caddy fragments
ansible.builtin.file:
path: "{{ item }}"
state: absent
loop: "{{ acp_bridge_server_obsolete_caddy_fragment_paths }}"
notify: Reload caddy
- name: Deploy ACP bridge systemd service
ansible.builtin.template:
src: acp-bridge-server.service.j2
dest: "/etc/systemd/system/{{ acp_bridge_server_service_name }}.service"
owner: root
group: root
mode: "0644"
notify: Restart acp bridge server
- name: Ensure ACP bridge server is enabled and running
ansible.builtin.systemd:
name: "{{ acp_bridge_server_service_name }}"
enabled: true
state: started
daemon_reload: true

View File

@ -1,4 +0,0 @@
{{ acp_bridge_server_domain }} {
encode zstd gzip
reverse_proxy {{ acp_bridge_server_listen_host }}:{{ acp_bridge_server_listen_port }}
}

View File

@ -1,17 +0,0 @@
[Unit]
Description=XWorkmate ACP Bridge Server
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User={{ acp_bridge_server_service_user }}
Group={{ acp_bridge_server_service_group }}
WorkingDirectory={{ acp_bridge_server_workdir }}
ExecStart={{ acp_bridge_server_binary_path }} serve --config /etc/xworkmate/xworkmate-acp-bridge-server.yaml --listen {{ acp_bridge_server_listen_host }}:{{ acp_bridge_server_listen_port }}
Restart=on-failure
RestartSec=2
NoNewPrivileges=yes
[Install]
WantedBy=multi-user.target

Some files were not shown because too many files have changed in this diff Show More