Files
cluster-platform-v3/templates/longhorn-recurringjobs.yaml

58 lines
1.9 KiB
YAML

{{- /*
Phase 5 of ADR 0003 — two-layer protection for tenant volumes.
Layer A — local CoW snapshots (`task: snapshot`)
Hourly. Instant. Zero blocking. Block-level CoW means the snapshot
shares blocks with live data; only diverged writes consume new
space. Cheap to keep ~24 hours of granular undo.
Layer B — async S3 backup (`task: backup`)
Daily. Block-incremental. The customer's workflow never waits on
the upload — Longhorn streams blocks to the configured S3 target
in the background. Renders the cluster's data durable off-cluster
even if Layer A snapshots are wiped (e.g. server reformat).
Both layers are independent of Tower's existing application-level
pg_dump backup. The application backup captures higher-level
semantic state (schema-aware, restorable to a different PG major
or cluster) and is what makes cross-cluster migration possible.
The Longhorn layers capture block-level state and are what makes
fast same-cluster restore possible. Both run; the customer keeps
both. Decision 0002 (Standard tier ships always-on durable backup)
is satisfied by the application layer alone; Longhorn-S3 is the
velocity-and-redundancy upgrade.
Both jobs target Longhorn's `default` group, which auto-includes
every volume with no explicit recurring-job reference. So the
schedule applies to existing AND future tenant PVCs without
operator action per-instance.
*/ -}}
{{- if .Values.longhorn.enabled }}
---
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: tenants-snapshot-hourly
namespace: longhorn-system
spec:
cron: "0 * * * *"
task: snapshot
groups: [default]
retain: 24
concurrency: 2
{{- if .Values.longhorn.defaultSettings.backupTarget }}
---
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: tenants-backup-daily
namespace: longhorn-system
spec:
cron: "0 3 * * *"
task: backup
groups: [default]
retain: 7
concurrency: 2
{{- end }}
{{- end }}