docs(playbooks): add cert-manager architecture

This commit is contained in:
Haitao Pan 2026-04-04 11:25:22 +08:00
parent b150174d1b
commit 36b1425365
2 changed files with 87 additions and 0 deletions

77
docs/cert-manager-arch.md Normal file
View File

@ -0,0 +1,77 @@
# cert-manager Architecture
This document records the complete certificate control-plane contract for the `svc.plus` platform.
## Scope
The system is split into four distinct responsibilities:
- `cert-manager` owns certificate issuance, renewal, and the target `Secret` objects.
- `Caddy` remains the ingress surface and serves HTTP-01 challenge traffic.
- `external-dns` only manages DNS records for public hostnames.
- `external-secrets` continues to materialize Vault-sourced application secrets, AK/SK pairs, future provider credentials such as the Cloudflare API token, and image pull secrets.
## Default Contract
- `postgresql-prod.svc.plus` defaults to `cert-manager + ACME HTTP-01`.
- `DNS-01 + Cloudflare` is predeclared for wildcard certificates and future subdomains.
- `selfSigned` remains available as an internal temporary or recovery fallback.
- `cert-manager` owns `postgresql-tls` in every namespace that consumes it, so there is no cross-namespace Secret sync job.
## System Diagram
```mermaid
flowchart LR
Vault[(Vault)]
ESO[external-secrets]
CloudflareToken[(cloudflare-api-token Secret)]
ExternalDNS[external-dns]
DNSZone[(svc.plus DNS zone)]
Caddy[Caddy Ingress]
CertMgr[cert-manager]
Http01["ACME HTTP-01"]
Dns01["ACME DNS-01 + Cloudflare"]
SelfSigned["selfSigned fallback"]
PlatformCert["platform/postgresql-tls Certificate"]
PlatformSecret["platform/postgresql-tls Secret"]
DatabaseCert["database/postgresql-tls Certificate"]
DatabaseSecret["database/postgresql-tls Secret"]
PostgreSQL["postgresql-prod.svc.plus"]
Stunnel["database/stunnel-server"]
Vault --> ESO
ESO --> CloudflareToken
CloudflareToken --> Dns01
ExternalDNS --> DNSZone
DNSZone --> Caddy
Caddy --> Http01
Http01 --> CertMgr
Dns01 --> CertMgr
SelfSigned -. fallback .-> CertMgr
CertMgr --> PlatformCert
CertMgr --> DatabaseCert
PlatformCert --> PlatformSecret
DatabaseCert --> DatabaseSecret
PlatformSecret --> Caddy
DatabaseSecret --> Stunnel
Caddy --> PostgreSQL
```
## Operational Rules
- Keep `cert-manager` as the source of truth for TLS Secret ownership.
- Keep `Caddy` as the traffic and HTTP-01 routing layer only.
- Keep `external-dns` focused on DNS record reconciliation.
- Keep `external-secrets` focused on external secret materialization.
- Treat the Cloudflare API token as an external input secret; it can be bootstrapped manually or delivered by `external-secrets` when that path is wired in.
- Prefer namespace-local `Certificate` objects for each consumer namespace.
- Avoid cross-namespace certificate copying or Secret sync controllers.
## Related Playbook Roles
- `vhosts/k3s_platform_bootstrap`
- installs the platform node and prepares GitOps handoff
- `vhosts/k3s_platform_addon`
- installs shared platform services such as `cert-manager`, `external-secrets`, `caddy`, and `external-dns`
- `GitOps`
- owns the namespace-local `Certificate` manifests and workload wiring

View File

@ -20,6 +20,11 @@ Recommended path for a new platform project:
`common -> k3s_platform_bootstrap -> k3s_platform_addon -> GitOps` `common -> k3s_platform_bootstrap -> k3s_platform_addon -> GitOps`
### Edge And Certificate System
The full certificate control-plane contract is documented in
[cert-manager-arch.md](./cert-manager-arch.md).
Responsibilities: Responsibilities:
- `common` - `common`
@ -36,6 +41,8 @@ Responsibilities:
- `k3s_platform_addon` - `k3s_platform_addon`
- install platform-side shared components into Kubernetes - install platform-side shared components into Kubernetes
- examples: `cert-manager`, `external-secrets`, `reloader`, `caddy`, `apisix`, `external-dns` - examples: `cert-manager`, `external-secrets`, `reloader`, `caddy`, `apisix`, `external-dns`
- certificate control plane, DNS automation, and external secret sync all belong in this layer
- see [cert-manager-arch.md](./cert-manager-arch.md) for the full edge and certificate contract
- `GitOps` - `GitOps`
- own dynamic platform configuration - own dynamic platform configuration
- own workload manifests - own workload manifests
@ -206,6 +213,7 @@ When adding new capabilities:
- k3s installation and Flux bootstrap belong in `k3s_platform_bootstrap` - k3s installation and Flux bootstrap belong in `k3s_platform_bootstrap`
- platform shared addons belong in `k3s_platform_addon` - platform shared addons belong in `k3s_platform_addon`
- TLS issuers and namespace-local certificate lifecycle should also live there - TLS issuers and namespace-local certificate lifecycle should also live there
- the complete edge and certificate contract lives in [cert-manager-arch.md](./cert-manager-arch.md)
- server and agent lifecycle actions belong in `k3s-cluster-server` or `k3s-cluster-agent` - server and agent lifecycle actions belong in `k3s-cluster-server` or `k3s-cluster-agent`
- dynamic service configuration belongs in GitOps - dynamic service configuration belongs in GitOps
- reset and cleanup behavior belongs in `k3s-reset` - reset and cleanup behavior belongs in `k3s-reset`
@ -214,6 +222,8 @@ GitOps certificate rule of thumb:
- use `cert-manager` to own the `Certificate` in the namespace that consumes it - use `cert-manager` to own the `Certificate` in the namespace that consumes it
- avoid cross-namespace Secret sync jobs when the same certificate can be declared directly in each namespace - avoid cross-namespace Secret sync jobs when the same certificate can be declared directly in each namespace
- use `Caddy` only as the ingress / HTTP-01 routing surface for those certificates
- use `external-dns` only for DNS record updates
- keep `external-secrets` for Vault-sourced app credentials, cloud API keys, and image pull secrets - keep `external-secrets` for Vault-sourced app credentials, cloud API keys, and image pull secrets
Do not add new functionality to: Do not add new functionality to: