# tenants-wildcard Certificate(s) — one per VERIFIED domain in # tenant.domains[] (#320.C). The primary entry keeps the canonical # `tenants-wildcard` / `tenants-wildcard-tls` names so existing # instances (whose IngressRoute references that exact secret) keep # serving without re-deploy. Each non-primary domain gets its own # Certificate + Secret named after the root with `.` → `-`, so the # cluster ends up with N TLS Secrets — one per tenant domain — and # instances can pick the right one based on their host. # # Legacy fallback: when tenant.domains[] is empty (a chart consumer # from before #320.A), synthesize a single entry from the scalar # tenant.wildcardHost so this template stays one-pass. # # Verified=false entries are skipped on purpose — that's the safety # valve called out in #320.A. A half-configured add-domain (root set, # DNS not yet pointed) waits in the data layer; the chart doesn't # try to issue and stall the whole sync. # # DNS-01 takes 30–90 s on a fast day, 5–10 min on a slow one # (Cloudflare zone propagation + LE order processing). Until Slice # 2B.1 (2026-05-04) the wildcard Certificate's Ready status gated # the entire Argo Application's Health — meaning Connect Server # sat at "Provisioning…" for the full 5–10 min before substrate # became "Ready", even though all the BASE infra (longhorn, # cert-manager, traefik, registry) was up within ~30 s. # # The annotation `argocd.argoproj.io/sync-options: SkipHealthCheck=true` # below tells Argo "still sync this resource, but don't include # its Ready status when computing the parent Application's Health". # Result: substrate becomes Ready in ~30 s; the wildcard issues in # the background. # # Tradeoff: an instance deployed inside the first ~5 min after # Connect references a Secret (`tenants-wildcard-tls`) that doesn't # exist yet — its IngressRoute is healthy but TLS is unavailable. # Slice 2B.2 will plumb a per-host HTTP-01 fallback so the very # first deploy is also fast. Until then the operator should know: # Substrate Ready ≠ wildcard ready. Watch for the Secret to appear # (`kubectl -n tenants get secret tenants-wildcard-tls`) before the # first deploy on a fresh cluster. {{- $domains := .Values.tenant.domains | default (list) }} {{- if and (eq (len $domains) 0) .Values.tenant.wildcardHost }} {{- $domains = list (dict "root" .Values.tenant.domain "wildcardHost" .Values.tenant.wildcardHost "primary" true "verified" true) }} {{- end }} {{- range $i, $d := $domains }} {{- if and $d.verified $d.wildcardHost }} {{- $suffix := "" }} {{- if not $d.primary }} {{- $suffix = printf "-%s" (replace "." "-" $d.root) }} {{- end }} --- apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: {{ printf "tenants-wildcard%s" $suffix | quote }} namespace: tenants labels: app.kubernetes.io/managed-by: cluster-platform-v3 odoosky.io/domain-root: {{ $d.root | quote }} {{- if $d.primary }} odoosky.io/domain-primary: "true" {{- end }} annotations: # Slice 2B.1 — substrate Ready in ~30 s. Argo will still # sync this Certificate (cert-manager will issue it via # DNS-01 in the background), but its Ready condition does # NOT gate the parent Application's Health calculation. So # the cluster-platform-v3 App flips Healthy as soon as the # base components (longhorn + cert-manager + traefik + # registry) are up, instead of waiting 5–10 min for LE to # finish the wildcard issuance. argocd.argoproj.io/sync-options: SkipHealthCheck=true spec: secretName: {{ printf "tenants-wildcard%s-tls" $suffix | quote }} issuerRef: name: letsencrypt-prod kind: ClusterIssuer commonName: {{ $d.wildcardHost | quote }} dnsNames: - {{ $d.wildcardHost | quote }} # Renew 30 days before expiry — Let's Encrypt certs are 90-day, so # this gives cert-manager a 30-day window to retry if Cloudflare # has a bad day during renewal. renewBefore: 720h {{- end }} {{- end }}