cluster-issuer.yaml: name → letsencrypt-prod-{{ tenant.slug }}, hard-pin
apiTokenSecretRef.name to cloudflare-api-token-{{ tenant.slug }} so it
matches the ESO-created Secret. ACME account key also slug-suffixed
for tenant isolation. Pre-0.7.3 the unsuffixed letsencrypt-prod
mismatched what instance.go:504 stamps into per-instance Certificates
(letsencrypt-prod-<slug>), so cert-manager logged 'Referenced
ClusterIssuer not found' and erp2 served Traefik default cert forever.
tenants-wildcard-cert.yaml: issuerRef.name → letsencrypt-prod-{{ $.Values.tenant.slug }}
to match the renamed ClusterIssuer.
values.yaml: secrets.cloudflareTokenSecret block deprecated (the chart
no longer reads it; kept for back-compat with external overrides).
Diagnosed in the qsoft2 migrate test 2026-05-09.
77 lines
3.7 KiB
YAML
77 lines
3.7 KiB
YAML
{{- if and .Values.tenant.domain .Values.tenant.slug }}
|
|
# letsencrypt-prod-<slug> ClusterIssuer — DNS-01 challenge via Cloudflare,
|
|
# scoped to THIS tenant via the per-tenant CF token Secret. The
|
|
# `letsencrypt-prod-<slug>` naming MUST match tenantClusterIssuerName()
|
|
# in backend/cmd/api/tenant_substrate.go — the per-instance overlay
|
|
# renderer in instance.go:504 stamps that exact name into every
|
|
# Certificate's issuerRef. Pre-0.7.3 charts used the unsuffixed name
|
|
# `letsencrypt-prod`, which broke for any instance asking for the
|
|
# slugged form (the qsoft2 migrate test on 2026-05-09 surfaced this:
|
|
# erp2's Certificate referenced letsencrypt-prod-qsoft, the chart only
|
|
# rendered letsencrypt-prod, cert-manager logged "Referenced ClusterIssuer
|
|
# not found", erp2 served the Traefik default cert forever).
|
|
#
|
|
# Multi-zone: the solver has NO `selector.dnsZones` restriction. The
|
|
# tenant's Cloudflare token typically covers many zones (a tenant with
|
|
# 41 owned domains is normal); we let cert-manager pick whichever zone
|
|
# matches the requested host. The token's access is the natural
|
|
# boundary — if it can't write a zone, the challenge fails loudly.
|
|
#
|
|
# Earlier the solver was scoped to `.Values.tenant.domain` only, which
|
|
# made instances on ANY other tenant-owned domain unable to issue (the
|
|
# `app.havari.me` symptom on a tenant whose primary domain is
|
|
# `4th.online`). Dropping the selector unifies single-zone and
|
|
# multi-zone tenants under one issuer.
|
|
#
|
|
# The cloudflare-api-token-<slug> Secret is now chart-managed via the
|
|
# ESO ExternalSecret in cloudflare-api-token-externalsecret.yaml (which
|
|
# pulls the token from OpenBao at v3/tenants/<id>/cloudflare-token).
|
|
# Naming kept symmetric with that template.
|
|
#
|
|
# Sync wave 1 (Slice 2B.1.1, 2026-05-04). cert-manager itself
|
|
# installs at the default wave 0; Argo waits for ALL wave-0
|
|
# resources (cert-manager Deployments + webhook Service) to be
|
|
# Healthy before applying wave 1. Without this we hit a race:
|
|
# Argo applied this ClusterIssuer in the same wave as cert-manager
|
|
# Deployments → cert-manager-webhook wasn't Ready yet → admission
|
|
# webhook rejected the resource → Argo backed off exponentially
|
|
# 30-90s before retrying. retries=2 was the smoking gun in the
|
|
# demo-server105 timing analysis (3 min ready instead of ~45 s).
|
|
#
|
|
# Note ordering: ClusterIssuer at wave 1, Certificate at wave 2
|
|
# (in tenants-wildcard-cert.yaml) — Certificate references the
|
|
# ClusterIssuer by name, so the resource graph also reflects the
|
|
# logical dependency.
|
|
#
|
|
# Multi-tenant clusters (visiting tenants on a host tenant's cluster)
|
|
# remain a known gap (Item #9 follow-up): the ESO ExternalSecret loop
|
|
# only iterates the cluster-owner tenant. When a future deploy lands a
|
|
# non-owner tenant on a cluster, that tenant's CF Secret + Issuer must
|
|
# be applied out-of-band until this template grows a `Values.tenants[]`
|
|
# loop and Tower's onboarding code populates it.
|
|
apiVersion: cert-manager.io/v1
|
|
kind: ClusterIssuer
|
|
metadata:
|
|
name: letsencrypt-prod-{{ .Values.tenant.slug }}
|
|
annotations:
|
|
argocd.argoproj.io/sync-wave: "1"
|
|
labels:
|
|
app.kubernetes.io/managed-by: cluster-platform-v3
|
|
odoosky.io/tenant: {{ .Values.tenant.id | quote }}
|
|
spec:
|
|
acme:
|
|
email: {{ required "acme.email is required" .Values.acme.email | quote }}
|
|
server: {{ .Values.acme.server | quote }}
|
|
privateKeySecretRef:
|
|
# Slug-suffixed so each tenant has its own ACME account key —
|
|
# cleaner isolation if a tenant rotates / audits, and avoids
|
|
# implicit shared state if two tenants ever land on one cluster.
|
|
name: letsencrypt-prod-account-key-{{ .Values.tenant.slug }}
|
|
solvers:
|
|
- dns01:
|
|
cloudflare:
|
|
apiTokenSecretRef:
|
|
name: cloudflare-api-token-{{ .Values.tenant.slug }}
|
|
key: api-token
|
|
{{- end }}
|