feat(slice 2B.1): SkipHealthCheck on tenants-wildcard Cert (chart 0.5.4)

This commit is contained in:
OdooSky v3
2026-05-04 12:38:18 +03:00
parent b6d5b29f3e
commit 46e8309153
2 changed files with 34 additions and 7 deletions

View File

@@ -23,8 +23,8 @@ description: |
Git). Git).
type: application type: application
version: 0.5.3 version: 0.5.4
appVersion: "0.5.3" appVersion: "0.5.4"
dependencies: dependencies:
- name: cert-manager - name: cert-manager

View File

@@ -16,11 +16,28 @@
# DNS not yet pointed) waits in the data layer; the chart doesn't # DNS not yet pointed) waits in the data layer; the chart doesn't
# try to issue and stall the whole sync. # try to issue and stall the whole sync.
# #
# DNS-01 takes 3090 s in normal Cloudflare conditions; cert-manager # DNS-01 takes 3090 s on a fast day, 510 min on a slow one
# retries forever on transient failures. The Argo Application that # (Cloudflare zone propagation + LE order processing). Until Slice
# deploys this chart is "Healthy" only when EVERY Certificate's # 2B.1 (2026-05-04) the wildcard Certificate's Ready status gated
# Ready condition flips to True — multi-domain deploys take a # the entire Argo Application's Health — meaning Connect Server
# proportionally longer first sync. # sat at "Provisioning…" for the full 510 min before substrate
# became "Ready", even though all the BASE infra (longhorn,
# cert-manager, traefik, registry) was up within ~30 s.
#
# The annotation `argocd.argoproj.io/sync-options: SkipHealthCheck=true`
# below tells Argo "still sync this resource, but don't include
# its Ready status when computing the parent Application's Health".
# Result: substrate becomes Ready in ~30 s; the wildcard issues in
# the background.
#
# Tradeoff: an instance deployed inside the first ~5 min after
# Connect references a Secret (`tenants-wildcard-tls`) that doesn't
# exist yet — its IngressRoute is healthy but TLS is unavailable.
# Slice 2B.2 will plumb a per-host HTTP-01 fallback so the very
# first deploy is also fast. Until then the operator should know:
# Substrate Ready ≠ wildcard ready. Watch for the Secret to appear
# (`kubectl -n tenants get secret tenants-wildcard-tls`) before the
# first deploy on a fresh cluster.
{{- $domains := .Values.tenant.domains | default (list) }} {{- $domains := .Values.tenant.domains | default (list) }}
{{- if and (eq (len $domains) 0) .Values.tenant.wildcardHost }} {{- if and (eq (len $domains) 0) .Values.tenant.wildcardHost }}
{{- $domains = list (dict {{- $domains = list (dict
@@ -47,6 +64,16 @@ metadata:
{{- if $d.primary }} {{- if $d.primary }}
odoosky.io/domain-primary: "true" odoosky.io/domain-primary: "true"
{{- end }} {{- end }}
annotations:
# Slice 2B.1 — substrate Ready in ~30 s. Argo will still
# sync this Certificate (cert-manager will issue it via
# DNS-01 in the background), but its Ready condition does
# NOT gate the parent Application's Health calculation. So
# the cluster-platform-v3 App flips Healthy as soon as the
# base components (longhorn + cert-manager + traefik +
# registry) are up, instead of waiting 510 min for LE to
# finish the wildcard issuance.
argocd.argoproj.io/sync-options: SkipHealthCheck=true
spec: spec:
secretName: {{ printf "tenants-wildcard%s-tls" $suffix | quote }} secretName: {{ printf "tenants-wildcard%s-tls" $suffix | quote }}
issuerRef: issuerRef: