fix(docker): use system Node in componentized builders + retry apk add (#28888)

* fix(docker): use system Node in componentized builders + retry apk add

Two failure modes in the componentized image builds (backend, migrations,
gateway) on project-releaser, with the same root cause:

1. The builder-stage `apk add` was missing `libatomic`. `prisma generate`
   triggers prisma-client-py's `nodeenv`, which downloads the latest stable
   Node.js at build time. Node 26.1.0 (last passing build on 2026-05-20) did
   not dynamically link `libatomic.so.1`. Node 26.2.0 (current latest) does,
   and the Wolfi builder doesn't ship libatomic — so `npm install prisma@…`
   fails with `node: error while loading shared libraries: libatomic.so.1`
   and exit 127. Retrying or pinning the Node version is a treadmill; the
   root issue is that nodeenv decides the Node version at build time.

   Fix: add `nodejs npm` to the builder-stage `apk add` so prisma-client-py
   uses Wolfi's own Node via its default `PRISMA_USE_GLOBAL_NODE=true`. The
   legacy `docker/Dockerfile.non_root` already does this; the componentized
   Dockerfiles regressed it. Setting `PRISMA_USE_GLOBAL_NODE=true` in ENV
   redundantly nails the intent so a future env override can't silently
   re-enable nodeenv's download.

2. Transient `apk.cgr.dev` mirror flakes during the arm64 leg of multi-arch
   builds cause individual package fetches to fail mid-install (we saw
   `nss-db-2.43-r7: remote server returned error (try 'apk update')` and
   similar for libzstd1, libogg, binutils in this run). None of the
   componentized Dockerfiles wrap `apk add` in a retry loop.

   Fix: wrap every `apk add` (builder + runtime, all three files) in the
   same `for i in 1 2 3; do … && break || sleep 5; done` loop that the
   legacy `docker/Dockerfile.non_root` already uses.

Affected files all have the same shape — backend, migrations, gateway —
because they're three near-identical componentizations of the original
monolithic proxy Dockerfile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(docker): trim verbose comments on builder Node setup

Same fix, leaner comments. The apk-add note is 3 lines now (was 8), and the
PRISMA_USE_GLOBAL_NODE bullet matches the existing UV_* comment style.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): make apk-add retry loop fail loudly on exhaustion

Greptile flagged that the retry pattern `apk add ... && break || sleep 5`
exits 0 when all three attempts fail, because `sleep 5` is the last
executed command. A persistent apk.cgr.dev outage would produce a silently
"successful" RUN layer with no packages installed, followed by cryptic
"command not found" errors in downstream RUN steps.

Fix: explicitly fail on the third miss before sleeping. Same pattern in
all six retry loops (3 files × builder + runtime).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MBP.localdomain>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Yassin Kortam 2026-05-26 15:41:38 -07:00 committed by GitHub
parent 13512e7abd
commit a645d464e6
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 46 additions and 6 deletions

View File

@ -12,17 +12,27 @@ USER root
COPY --from=uvbin /uv /uvx /usr/local/bin/
RUN apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile
# nodejs/npm so `prisma generate` uses Wolfi's Node via PRISMA_USE_GLOBAL_NODE
# instead of nodeenv downloading one whose dynamic deps may not be in Wolfi
# (e.g. Node 26.2.0 needs libatomic). Retry for transient apk.cgr.dev flakes.
RUN for i in 1 2 3; do \
apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile nodejs npm && break; \
[ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \
sleep 5; \
done
# UV_COMPILE_BYTECODE=1 precompiles .pyc at install time → faster cold start.
# UV_LINK_MODE=copy avoids hardlink warnings when uv installs from a
# BuildKit cache mount (different filesystem).
# UV_PYTHON_DOWNLOADS=0 force uv to use the apk-installed CPython instead of
# silently pulling a managed interpreter.
# PRISMA_USE_GLOBAL_NODE explicit (matches default) so an env override can't
# silently re-enable nodeenv's Node download.
ENV UV_PROJECT_ENVIRONMENT=/app/.venv \
UV_LINK_MODE=copy \
UV_COMPILE_BYTECODE=1 \
UV_PYTHON_DOWNLOADS=0 \
PRISMA_USE_GLOBAL_NODE=true \
PATH="/app/.venv/bin:${PATH}"
# Stage 1 — install dependencies only.
@ -58,7 +68,11 @@ FROM $LITELLM_RUNTIME_IMAGE AS runtime
USER root
RUN apk add --no-cache bash openssl tzdata python3 libsndfile libatomic
RUN for i in 1 2 3; do \
apk add --no-cache bash openssl tzdata python3 libsndfile libatomic && break; \
[ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \
sleep 5; \
done
# wolfi-base ships an unprivileged `nonroot` account (UID/GID 65532) with
# /home/nonroot. We run the backend as that user

View File

@ -12,17 +12,27 @@ USER root
COPY --from=uvbin /uv /uvx /usr/local/bin/
RUN apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile
# nodejs/npm so `prisma generate` uses Wolfi's Node via PRISMA_USE_GLOBAL_NODE
# instead of nodeenv downloading one whose dynamic deps may not be in Wolfi
# (e.g. Node 26.2.0 needs libatomic). Retry for transient apk.cgr.dev flakes.
RUN for i in 1 2 3; do \
apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile nodejs npm && break; \
[ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \
sleep 5; \
done
# UV_COMPILE_BYTECODE=1 precompiles .pyc at install time → faster cold start.
# UV_LINK_MODE=copy avoids hardlink warnings when uv installs from a
# BuildKit cache mount (different filesystem).
# UV_PYTHON_DOWNLOADS=0 force uv to use the apk-installed CPython instead of
# silently pulling a managed interpreter.
# PRISMA_USE_GLOBAL_NODE explicit (matches default) so an env override can't
# silently re-enable nodeenv's Node download.
ENV UV_PROJECT_ENVIRONMENT=/app/.venv \
UV_LINK_MODE=copy \
UV_COMPILE_BYTECODE=1 \
UV_PYTHON_DOWNLOADS=0 \
PRISMA_USE_GLOBAL_NODE=true \
PATH="/app/.venv/bin:${PATH}"
# Stage 1 — install dependencies only.
@ -58,7 +68,11 @@ FROM $LITELLM_RUNTIME_IMAGE AS runtime
USER root
RUN apk add --no-cache bash openssl tzdata python3 libsndfile libatomic
RUN for i in 1 2 3; do \
apk add --no-cache bash openssl tzdata python3 libsndfile libatomic && break; \
[ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \
sleep 5; \
done
# wolfi-base ships an unprivileged `nonroot` account (UID/GID 65532) with
# /home/nonroot. We run the proxy as that user.

View File

@ -31,12 +31,20 @@ USER root
COPY --from=uvbin /uv /uvx /usr/local/bin/
RUN apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile
# nodejs/npm so `prisma generate` uses Wolfi's Node via PRISMA_USE_GLOBAL_NODE
# instead of nodeenv downloading one whose dynamic deps may not be in Wolfi
# (e.g. Node 26.2.0 needs libatomic). Retry for transient apk.cgr.dev flakes.
RUN for i in 1 2 3; do \
apk add --no-cache bash gcc python3 python3-dev openssl openssl-dev libsndfile nodejs npm && break; \
[ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \
sleep 5; \
done
ENV UV_PROJECT_ENVIRONMENT=/app/.venv \
UV_LINK_MODE=copy \
UV_COMPILE_BYTECODE=1 \
UV_PYTHON_DOWNLOADS=0 \
PRISMA_USE_GLOBAL_NODE=true \
PATH="/app/.venv/bin:${PATH}"
# Stage 1 — install third-party deps only (cached by pyproject.toml/uv.lock).
@ -78,7 +86,11 @@ FROM $LITELLM_RUNTIME_IMAGE AS runtime
USER root
RUN apk add --no-cache bash openssl tzdata python3 libsndfile libatomic
RUN for i in 1 2 3; do \
apk add --no-cache bash openssl tzdata python3 libsndfile libatomic && break; \
[ $i = 3 ] && { echo "apk add failed after 3 retries" >&2; exit 1; }; \
sleep 5; \
done
# wolfi-base ships an unprivileged `nonroot` account (UID/GID 65532). The
# Prisma engine binaries are dynamically linked against libssl/libcrypto, so