Add the chart-side machinery that lets Tower bypass the cert-manager
Certificate path on Reconnect by injecting a Vault-stashed wildcard
cert directly as a kubernetes.io/tls Secret.
values.yaml:
certManager.injectedWildcards: []
Each entry: { root, primary, crt, key }. Empty list = legacy ACME-only.
templates/tenants-wildcard-cert.yaml:
Build $injectedRoots index from injectedWildcards[]; per-domain
Certificate is skipped when its root has an injected entry.
templates/tenants-wildcard-secret.yaml (NEW):
Per injected entry, render kubernetes.io/tls Secret using the same
name the cert path would have produced (tenants-wildcard-tls primary,
tenants-wildcard-<root-as-dashes>-tls non-primary). Sync-wave 2 to
match the cert path's timing. Label odoosky.io/wildcard-source=
vault-injected so harvester can skip them.
Verified via helm template + self-signed dummy cert:
- Pure injection: 0 Certificate, 1 Secret (correct name + base64)
- Pure ACME: 1 Certificate, 0 Secret (status quo)
- Mixed (2 domains, 1 injected): 1 Certificate + 1 Secret
Inert without Tower wiring — existing clusters render identically to
0.5.6 because injectedWildcards defaults to []. Pushed first as the
foundation layer for the upcoming Tower restore + harvester slices.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
110 lines
4.9 KiB
YAML
110 lines
4.9 KiB
YAML
# tenants-wildcard Certificate(s) — one per VERIFIED domain in
|
||
# tenant.domains[] (#320.C). The primary entry keeps the canonical
|
||
# `tenants-wildcard` / `tenants-wildcard-tls` names so existing
|
||
# instances (whose IngressRoute references that exact secret) keep
|
||
# serving without re-deploy. Each non-primary domain gets its own
|
||
# Certificate + Secret named after the root with `.` → `-`, so the
|
||
# cluster ends up with N TLS Secrets — one per tenant domain — and
|
||
# instances can pick the right one based on their host.
|
||
#
|
||
# Legacy fallback: when tenant.domains[] is empty (a chart consumer
|
||
# from before #320.A), synthesize a single entry from the scalar
|
||
# tenant.wildcardHost so this template stays one-pass.
|
||
#
|
||
# Verified=false entries are skipped on purpose — that's the safety
|
||
# valve called out in #320.A. A half-configured add-domain (root set,
|
||
# DNS not yet pointed) waits in the data layer; the chart doesn't
|
||
# try to issue and stall the whole sync.
|
||
#
|
||
# DNS-01 takes 30–90 s on a fast day, 5–10 min on a slow one
|
||
# (Cloudflare zone propagation + LE order processing). Until Slice
|
||
# 2B.1 (2026-05-04) the wildcard Certificate's Ready status gated
|
||
# the entire Argo Application's Health — meaning Connect Server
|
||
# sat at "Provisioning…" for the full 5–10 min before substrate
|
||
# became "Ready", even though all the BASE infra (longhorn,
|
||
# cert-manager, traefik, registry) was up within ~30 s.
|
||
#
|
||
# The annotation `argocd.argoproj.io/sync-options: SkipHealthCheck=true`
|
||
# below tells Argo "still sync this resource, but don't include
|
||
# its Ready status when computing the parent Application's Health".
|
||
# Result: substrate becomes Ready in ~30 s; the wildcard issues in
|
||
# the background.
|
||
#
|
||
# Tradeoff: an instance deployed inside the first ~5 min after
|
||
# Connect references a Secret (`tenants-wildcard-tls`) that doesn't
|
||
# exist yet — its IngressRoute is healthy but TLS is unavailable.
|
||
# Slice 2B.2 will plumb a per-host HTTP-01 fallback so the very
|
||
# first deploy is also fast. Until then the operator should know:
|
||
# Substrate Ready ≠ wildcard ready. Watch for the Secret to appear
|
||
# (`kubectl -n tenants get secret tenants-wildcard-tls`) before the
|
||
# first deploy on a fresh cluster.
|
||
{{- $domains := .Values.tenant.domains | default (list) }}
|
||
{{- if and (eq (len $domains) 0) .Values.tenant.wildcardHost }}
|
||
{{- $domains = list (dict
|
||
"root" .Values.tenant.domain
|
||
"wildcardHost" .Values.tenant.wildcardHost
|
||
"primary" true
|
||
"verified" true) }}
|
||
{{- end }}
|
||
{{/* Slice 2B.3 — index of roots that have a Vault-stashed cert
|
||
injected via certManager.injectedWildcards[]. We skip the
|
||
Certificate resource entirely for those; the sibling
|
||
tenants-wildcard-secret.yaml renders the kubernetes.io/tls
|
||
Secret directly so no ACME order is placed. */}}
|
||
{{- $injectedRoots := dict }}
|
||
{{- range .Values.certManager.injectedWildcards | default (list) }}
|
||
{{- if and .root .crt .key }}
|
||
{{- $_ := set $injectedRoots .root true }}
|
||
{{- end }}
|
||
{{- end }}
|
||
{{- range $i, $d := $domains }}
|
||
{{- if and $d.verified $d.wildcardHost (not (hasKey $injectedRoots $d.root)) }}
|
||
{{- $suffix := "" }}
|
||
{{- if not $d.primary }}
|
||
{{- $suffix = printf "-%s" (replace "." "-" $d.root) }}
|
||
{{- end }}
|
||
---
|
||
apiVersion: cert-manager.io/v1
|
||
kind: Certificate
|
||
metadata:
|
||
name: {{ printf "tenants-wildcard%s" $suffix | quote }}
|
||
namespace: tenants
|
||
labels:
|
||
app.kubernetes.io/managed-by: cluster-platform-v3
|
||
odoosky.io/domain-root: {{ $d.root | quote }}
|
||
{{- if $d.primary }}
|
||
odoosky.io/domain-primary: "true"
|
||
{{- end }}
|
||
annotations:
|
||
# Slice 2B.1 — substrate Ready in ~30 s. Argo will still
|
||
# sync this Certificate (cert-manager will issue it via
|
||
# DNS-01 in the background), but its Ready condition does
|
||
# NOT gate the parent Application's Health calculation. So
|
||
# the cluster-platform-v3 App flips Healthy as soon as the
|
||
# base components (longhorn + cert-manager + traefik +
|
||
# registry) are up, instead of waiting 5–10 min for LE to
|
||
# finish the wildcard issuance.
|
||
argocd.argoproj.io/sync-options: SkipHealthCheck=true
|
||
# Slice 2B.1.1 — wave 2: apply AFTER the ClusterIssuer
|
||
# (wave 1) which depends on cert-manager (wave 0 default).
|
||
# Argo enforces strict wave ordering with health-gating
|
||
# between waves, so the Certificate never lands before its
|
||
# ClusterIssuer exists or before cert-manager-webhook is
|
||
# accepting admission requests. Eliminates the retries=2
|
||
# exponential-backoff penalty observed on demo-server105.
|
||
argocd.argoproj.io/sync-wave: "2"
|
||
spec:
|
||
secretName: {{ printf "tenants-wildcard%s-tls" $suffix | quote }}
|
||
issuerRef:
|
||
name: letsencrypt-prod
|
||
kind: ClusterIssuer
|
||
commonName: {{ $d.wildcardHost | quote }}
|
||
dnsNames:
|
||
- {{ $d.wildcardHost | quote }}
|
||
# Renew 30 days before expiry — Let's Encrypt certs are 90-day, so
|
||
# this gives cert-manager a 30-day window to retry if Cloudflare
|
||
# has a bad day during renewal.
|
||
renewBefore: 720h
|
||
{{- end }}
|
||
{{- end }}
|