From a645d464e69575f731212a44725cbd9ad011d3c4 Mon Sep 17 00:00:00 2001 From: Yassin Kortam Date: Tue, 26 May 2026 15:41:38 -0700 Subject: [PATCH] fix(docker): use system Node in componentized builders + retry apk add (#28888) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(docker): use system Node in componentized builders + retry apk add Two failure modes in the componentized image builds (backend, migrations, gateway) on project-releaser, with the same root cause: 1. The builder-stage `apk add` was missing `libatomic`. `prisma generate` triggers prisma-client-py's `nodeenv`, which downloads the latest stable Node.js at build time. Node 26.1.0 (last passing build on 2026-05-20) did not dynamically link `libatomic.so.1`. Node 26.2.0 (current latest) does, and the Wolfi builder doesn't ship libatomic — so `npm install prisma@…` fails with `node: error while loading shared libraries: libatomic.so.1` and exit 127. Retrying or pinning the Node version is a treadmill; the root issue is that nodeenv decides the Node version at build time. Fix: add `nodejs npm` to the builder-stage `apk add` so prisma-client-py uses Wolfi's own Node via its default `PRISMA_USE_GLOBAL_NODE=true`. The legacy `docker/Dockerfile.non_root` already does this; the componentized Dockerfiles regressed it. Setting `PRISMA_USE_GLOBAL_NODE=true` in ENV redundantly nails the intent so a future env override can't silently re-enable nodeenv's download. 2. Transient `apk.cgr.dev` mirror flakes during the arm64 leg of multi-arch builds cause individual package fetches to fail mid-install (we saw `nss-db-2.43-r7: remote server returned error (try 'apk update')` and similar for libzstd1, libogg, binutils in this run). None of the componentized Dockerfiles wrap `apk add` in a retry loop. Fix: wrap every `apk add` (builder + runtime, all three files) in the same `for i in 1 2 3; do … && break || sleep 5; done` loop that the legacy `docker/Dockerfile.non_root` already uses. Affected files all have the same shape — backend, migrations, gateway — because they're three near-identical componentizations of the original monolithic proxy Dockerfile. Co-Authored-By: Claude Opus 4.7 (1M context) * chore(docker): trim verbose comments on builder Node setup Same fix, leaner comments. The apk-add note is 3 lines now (was 8), and the PRISMA_USE_GLOBAL_NODE bullet matches the existing UV_* comment style. Co-Authored-By: Claude Opus 4.7 (1M context) * fix(docker): make apk-add retry loop fail loudly on exhaustion Greptile flagged that the retry pattern `apk add ... && break || sleep 5` exits 0 when all three attempts fail, because `sleep 5` is the last executed command. A persistent apk.cgr.dev outage would produce a silently "successful" RUN layer with no packages installed, followed by cryptic "command not found" errors in downstream RUN steps. Fix: explicitly fail on the third miss before sleeping. Same pattern in all six retry loops (3 files × builder + runtime). Co-Authored-By: Claude Opus 4.7 (1M context) --------- Co-authored-by: Yassin Kortam Co-authored-by: Claude Opus 4.7 (1M context) --- backend/Dockerfile | 18 ++++++++++++++++-- gateway/Dockerfile | 18 ++++++++++++++++-- migrations/Dockerfile | 16 ++++++++++++++-- 3 files changed, 46 insertions(+), 6 deletions(-) diff --git a/backend/Dockerfile b/backend/Dockerfile index c08014fc0e..2cfdde8a51 100644 --- a/backend/Dockerfile +++ b/backend/Dockerfile @@ -12,17 +12,27 @@ USER root COPY --from=uvbin /uv /uvx /usr/local/bin/ -RUN apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile +# nodejs/npm so `prisma generate` uses Wolfi's Node via PRISMA_USE_GLOBAL_NODE +# instead of nodeenv downloading one whose dynamic deps may not be in Wolfi +# (e.g. Node 26.2.0 needs libatomic). Retry for transient apk.cgr.dev flakes. +RUN for i in 1 2 3; do \ + apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile nodejs npm && break; \ + [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \ + sleep 5; \ + done # UV_COMPILE_BYTECODE=1 precompiles .pyc at install time → faster cold start. # UV_LINK_MODE=copy avoids hardlink warnings when uv installs from a # BuildKit cache mount (different filesystem). # UV_PYTHON_DOWNLOADS=0 force uv to use the apk-installed CPython instead of # silently pulling a managed interpreter. +# PRISMA_USE_GLOBAL_NODE explicit (matches default) so an env override can't +# silently re-enable nodeenv's Node download. ENV UV_PROJECT_ENVIRONMENT=/app/.venv \ UV_LINK_MODE=copy \ UV_COMPILE_BYTECODE=1 \ UV_PYTHON_DOWNLOADS=0 \ + PRISMA_USE_GLOBAL_NODE=true \ PATH="/app/.venv/bin:${PATH}" # Stage 1 — install dependencies only. @@ -58,7 +68,11 @@ FROM $LITELLM_RUNTIME_IMAGE AS runtime USER root -RUN apk add --no-cache bash openssl tzdata python3 libsndfile libatomic +RUN for i in 1 2 3; do \ + apk add --no-cache bash openssl tzdata python3 libsndfile libatomic && break; \ + [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \ + sleep 5; \ + done # wolfi-base ships an unprivileged `nonroot` account (UID/GID 65532) with # /home/nonroot. We run the backend as that user diff --git a/gateway/Dockerfile b/gateway/Dockerfile index a2ca3d3f83..19c8a10fdf 100644 --- a/gateway/Dockerfile +++ b/gateway/Dockerfile @@ -12,17 +12,27 @@ USER root COPY --from=uvbin /uv /uvx /usr/local/bin/ -RUN apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile +# nodejs/npm so `prisma generate` uses Wolfi's Node via PRISMA_USE_GLOBAL_NODE +# instead of nodeenv downloading one whose dynamic deps may not be in Wolfi +# (e.g. Node 26.2.0 needs libatomic). Retry for transient apk.cgr.dev flakes. +RUN for i in 1 2 3; do \ + apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile nodejs npm && break; \ + [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \ + sleep 5; \ + done # UV_COMPILE_BYTECODE=1 precompiles .pyc at install time → faster cold start. # UV_LINK_MODE=copy avoids hardlink warnings when uv installs from a # BuildKit cache mount (different filesystem). # UV_PYTHON_DOWNLOADS=0 force uv to use the apk-installed CPython instead of # silently pulling a managed interpreter. +# PRISMA_USE_GLOBAL_NODE explicit (matches default) so an env override can't +# silently re-enable nodeenv's Node download. ENV UV_PROJECT_ENVIRONMENT=/app/.venv \ UV_LINK_MODE=copy \ UV_COMPILE_BYTECODE=1 \ UV_PYTHON_DOWNLOADS=0 \ + PRISMA_USE_GLOBAL_NODE=true \ PATH="/app/.venv/bin:${PATH}" # Stage 1 — install dependencies only. @@ -58,7 +68,11 @@ FROM $LITELLM_RUNTIME_IMAGE AS runtime USER root -RUN apk add --no-cache bash openssl tzdata python3 libsndfile libatomic +RUN for i in 1 2 3; do \ + apk add --no-cache bash openssl tzdata python3 libsndfile libatomic && break; \ + [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \ + sleep 5; \ + done # wolfi-base ships an unprivileged `nonroot` account (UID/GID 65532) with # /home/nonroot. We run the proxy as that user. diff --git a/migrations/Dockerfile b/migrations/Dockerfile index 2160514251..a78a4e2225 100644 --- a/migrations/Dockerfile +++ b/migrations/Dockerfile @@ -31,12 +31,20 @@ USER root COPY --from=uvbin /uv /uvx /usr/local/bin/ -RUN apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile +# nodejs/npm so `prisma generate` uses Wolfi's Node via PRISMA_USE_GLOBAL_NODE +# instead of nodeenv downloading one whose dynamic deps may not be in Wolfi +# (e.g. Node 26.2.0 needs libatomic). Retry for transient apk.cgr.dev flakes. +RUN for i in 1 2 3; do \ + apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile nodejs npm && break; \ + [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \ + sleep 5; \ + done ENV UV_PROJECT_ENVIRONMENT=/app/.venv \ UV_LINK_MODE=copy \ UV_COMPILE_BYTECODE=1 \ UV_PYTHON_DOWNLOADS=0 \ + PRISMA_USE_GLOBAL_NODE=true \ PATH="/app/.venv/bin:${PATH}" # Stage 1 — install third-party deps only (cached by pyproject.toml/uv.lock). @@ -78,7 +86,11 @@ FROM $LITELLM_RUNTIME_IMAGE AS runtime USER root -RUN apk add --no-cache bash openssl tzdata python3 libsndfile libatomic +RUN for i in 1 2 3; do \ + apk add --no-cache bash openssl tzdata python3 libsndfile libatomic && break; \ + [ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \ + sleep 5; \ + done # wolfi-base ships an unprivileged `nonroot` account (UID/GID 65532). The # Prisma engine binaries are dynamically linked against libssl/libcrypto, so