diff --git a/docs/cert-manager-arch.md b/docs/cert-manager-arch.md new file mode 100644 index 0000000..84287bc --- /dev/null +++ b/docs/cert-manager-arch.md @@ -0,0 +1,77 @@ +# cert-manager Architecture + +This document records the complete certificate control-plane contract for the `svc.plus` platform. + +## Scope + +The system is split into four distinct responsibilities: + +- `cert-manager` owns certificate issuance, renewal, and the target `Secret` objects. +- `Caddy` remains the ingress surface and serves HTTP-01 challenge traffic. +- `external-dns` only manages DNS records for public hostnames. +- `external-secrets` continues to materialize Vault-sourced application secrets, AK/SK pairs, future provider credentials such as the Cloudflare API token, and image pull secrets. + +## Default Contract + +- `postgresql-prod.svc.plus` defaults to `cert-manager + ACME HTTP-01`. +- `DNS-01 + Cloudflare` is predeclared for wildcard certificates and future subdomains. +- `selfSigned` remains available as an internal temporary or recovery fallback. +- `cert-manager` owns `postgresql-tls` in every namespace that consumes it, so there is no cross-namespace Secret sync job. + +## System Diagram + +```mermaid +flowchart LR + Vault[(Vault)] + ESO[external-secrets] + CloudflareToken[(cloudflare-api-token Secret)] + ExternalDNS[external-dns] + DNSZone[(svc.plus DNS zone)] + Caddy[Caddy Ingress] + CertMgr[cert-manager] + Http01["ACME HTTP-01"] + Dns01["ACME DNS-01 + Cloudflare"] + SelfSigned["selfSigned fallback"] + PlatformCert["platform/postgresql-tls Certificate"] + PlatformSecret["platform/postgresql-tls Secret"] + DatabaseCert["database/postgresql-tls Certificate"] + DatabaseSecret["database/postgresql-tls Secret"] + PostgreSQL["postgresql-prod.svc.plus"] + Stunnel["database/stunnel-server"] + + Vault --> ESO + ESO --> CloudflareToken + CloudflareToken --> Dns01 + ExternalDNS --> DNSZone + DNSZone --> Caddy + Caddy --> Http01 + Http01 --> CertMgr + Dns01 --> CertMgr + SelfSigned -. fallback .-> CertMgr + CertMgr --> PlatformCert + CertMgr --> DatabaseCert + PlatformCert --> PlatformSecret + DatabaseCert --> DatabaseSecret + PlatformSecret --> Caddy + DatabaseSecret --> Stunnel + Caddy --> PostgreSQL +``` + +## Operational Rules + +- Keep `cert-manager` as the source of truth for TLS Secret ownership. +- Keep `Caddy` as the traffic and HTTP-01 routing layer only. +- Keep `external-dns` focused on DNS record reconciliation. +- Keep `external-secrets` focused on external secret materialization. +- Treat the Cloudflare API token as an external input secret; it can be bootstrapped manually or delivered by `external-secrets` when that path is wired in. +- Prefer namespace-local `Certificate` objects for each consumer namespace. +- Avoid cross-namespace certificate copying or Secret sync controllers. + +## Related Playbook Roles + +- `vhosts/k3s_platform_bootstrap` + - installs the platform node and prepares GitOps handoff +- `vhosts/k3s_platform_addon` + - installs shared platform services such as `cert-manager`, `external-secrets`, `caddy`, and `external-dns` +- `GitOps` + - owns the namespace-local `Certificate` manifests and workload wiring diff --git a/docs/k3s-role-map.md b/docs/k3s-role-map.md index 46aa633..1c4602e 100644 --- a/docs/k3s-role-map.md +++ b/docs/k3s-role-map.md @@ -20,6 +20,11 @@ Recommended path for a new platform project: `common -> k3s_platform_bootstrap -> k3s_platform_addon -> GitOps` +### Edge And Certificate System + +The full certificate control-plane contract is documented in +[cert-manager-arch.md](./cert-manager-arch.md). + Responsibilities: - `common` @@ -36,6 +41,8 @@ Responsibilities: - `k3s_platform_addon` - install platform-side shared components into Kubernetes - examples: `cert-manager`, `external-secrets`, `reloader`, `caddy`, `apisix`, `external-dns` + - certificate control plane, DNS automation, and external secret sync all belong in this layer + - see [cert-manager-arch.md](./cert-manager-arch.md) for the full edge and certificate contract - `GitOps` - own dynamic platform configuration - own workload manifests @@ -206,6 +213,7 @@ When adding new capabilities: - k3s installation and Flux bootstrap belong in `k3s_platform_bootstrap` - platform shared addons belong in `k3s_platform_addon` - TLS issuers and namespace-local certificate lifecycle should also live there + - the complete edge and certificate contract lives in [cert-manager-arch.md](./cert-manager-arch.md) - server and agent lifecycle actions belong in `k3s-cluster-server` or `k3s-cluster-agent` - dynamic service configuration belongs in GitOps - reset and cleanup behavior belongs in `k3s-reset` @@ -214,6 +222,8 @@ GitOps certificate rule of thumb: - use `cert-manager` to own the `Certificate` in the namespace that consumes it - avoid cross-namespace Secret sync jobs when the same certificate can be declared directly in each namespace +- use `Caddy` only as the ingress / HTTP-01 routing surface for those certificates +- use `external-dns` only for DNS record updates - keep `external-secrets` for Vault-sourced app credentials, cloud API keys, and image pull secrets Do not add new functionality to: