{{- if and .Values.tenant.domain .Values.tenant.slug }} # letsencrypt-prod- ClusterIssuer — DNS-01 challenge via Cloudflare, # scoped to THIS tenant via the per-tenant CF token Secret. The # `letsencrypt-prod-` naming MUST match tenantClusterIssuerName() # in backend/cmd/api/tenant_substrate.go — the per-instance overlay # renderer in instance.go:504 stamps that exact name into every # Certificate's issuerRef. Pre-0.7.3 charts used the unsuffixed name # `letsencrypt-prod`, which broke for any instance asking for the # slugged form (the qsoft2 migrate test on 2026-05-09 surfaced this: # erp2's Certificate referenced letsencrypt-prod-qsoft, the chart only # rendered letsencrypt-prod, cert-manager logged "Referenced ClusterIssuer # not found", erp2 served the Traefik default cert forever). # # Multi-zone: the solver has NO `selector.dnsZones` restriction. The # tenant's Cloudflare token typically covers many zones (a tenant with # 41 owned domains is normal); we let cert-manager pick whichever zone # matches the requested host. The token's access is the natural # boundary — if it can't write a zone, the challenge fails loudly. # # Earlier the solver was scoped to `.Values.tenant.domain` only, which # made instances on ANY other tenant-owned domain unable to issue (the # `app.havari.me` symptom on a tenant whose primary domain is # `4th.online`). Dropping the selector unifies single-zone and # multi-zone tenants under one issuer. # # The cloudflare-api-token- Secret is now chart-managed via the # ESO ExternalSecret in cloudflare-api-token-externalsecret.yaml (which # pulls the token from OpenBao at v3/tenants//cloudflare-token). # Naming kept symmetric with that template. # # Sync wave 1 (Slice 2B.1.1, 2026-05-04). cert-manager itself # installs at the default wave 0; Argo waits for ALL wave-0 # resources (cert-manager Deployments + webhook Service) to be # Healthy before applying wave 1. Without this we hit a race: # Argo applied this ClusterIssuer in the same wave as cert-manager # Deployments → cert-manager-webhook wasn't Ready yet → admission # webhook rejected the resource → Argo backed off exponentially # 30-90s before retrying. retries=2 was the smoking gun in the # demo-server105 timing analysis (3 min ready instead of ~45 s). # # Note ordering: ClusterIssuer at wave 1, Certificate at wave 2 # (in tenants-wildcard-cert.yaml) — Certificate references the # ClusterIssuer by name, so the resource graph also reflects the # logical dependency. # # Multi-tenant clusters (visiting tenants on a host tenant's cluster) # remain a known gap (Item #9 follow-up): the ESO ExternalSecret loop # only iterates the cluster-owner tenant. When a future deploy lands a # non-owner tenant on a cluster, that tenant's CF Secret + Issuer must # be applied out-of-band until this template grows a `Values.tenants[]` # loop and Tower's onboarding code populates it. apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod-{{ .Values.tenant.slug }} annotations: argocd.argoproj.io/sync-wave: "1" labels: app.kubernetes.io/managed-by: cluster-platform-v3 odoosky.io/tenant: {{ .Values.tenant.id | quote }} spec: acme: email: {{ required "acme.email is required" .Values.acme.email | quote }} server: {{ .Values.acme.server | quote }} privateKeySecretRef: # Slug-suffixed so each tenant has its own ACME account key — # cleaner isolation if a tenant rotates / audits, and avoids # implicit shared state if two tenants ever land on one cluster. name: letsencrypt-prod-account-key-{{ .Values.tenant.slug }} solvers: - dns01: cloudflare: apiTokenSecretRef: name: cloudflare-api-token-{{ .Values.tenant.slug }} key: api-token {{- end }}