0.2.0 — vendor cert-manager + traefik; parameterized substrate

bootstrap.sh-equivalent K8s manifests now ship as part of this
chart instead of being installed inline by the customer's
`curl … | sudo bash`. Result: customer terminal time drops from
~5 min to ~1 min once Tower's SubmitConnect (B2) creates the
per-cluster Argo Application that points here.

What's vendored:
  - cert-manager v1.16.1 (helm dep, charts/cert-manager-v1.16.1.tgz)
  - traefik 33.2.1       (helm dep, charts/traefik-33.2.1.tgz)

What's parameterized via .Values.tenant.{domain,wildcardHost}:
  - letsencrypt-prod ClusterIssuer (DNS-01 + tenant's Cloudflare zone)
  - tenants Namespace
  - tenants-wildcard Certificate (commonName + dnsNames from helm.values)

What stays out of Git (Tower kubectl-applies via kubeconfig at
Connect time, sourced from the tenant's Vault paths):
  - cloudflare-api-token Secret (cert-manager ns)
  - s3-backup-creds Secret      (tenants ns)

The chart references both Secrets by name only.

Argo health roll-up: a tenant server is "Ready" when this
Application's Health is `Healthy` and the tenants-wildcard
Certificate's Ready condition is True. Tower's Server card UI
will surface this as "Provisioning…" → "Ready" in B4.

Lint + template clean with a real tenant value set; clean with
empty values too (templates skip themselves so a default-rendered
chart doesn't fail without a tenant).
This commit is contained in:
pro-777
2026-04-29 15:09:33 +03:00
parent 0c17429d4c
commit eccb648276
7 changed files with 152 additions and 2 deletions

View File

@@ -0,0 +1,31 @@
{{- if .Values.tenant.wildcardHost }}
# tenants-wildcard Certificate — issued ONCE per cluster, referenced
# by every tenant instance's IngressRoute. Avoids Let's Encrypt's
# 50-cert/week per-domain rate limit as the cluster scales to many
# instances under one tenant.
#
# DNS-01 takes 3090 s in normal Cloudflare conditions; cert-manager
# retries forever on transient failures. The Argo Application that
# deploys this chart is "Healthy" only when the Certificate's Ready
# condition flips to True — Tower's UI uses that as the
# "Provisioning → Ready" transition for the Server card.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: tenants-wildcard
namespace: tenants
labels:
app.kubernetes.io/managed-by: cluster-platform-v3
spec:
secretName: tenants-wildcard-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
commonName: {{ .Values.tenant.wildcardHost | quote }}
dnsNames:
- {{ .Values.tenant.wildcardHost | quote }}
# Renew 30 days before expiry — Let's Encrypt certs are 90-day, so
# this gives cert-manager a 30-day window to retry if Cloudflare
# has a bad day during renewal.
renewBefore: 720h
{{- end }}