fix(db): bump preview CNPG storage 10Gi→20Gi (reconcile live patch)#159
Open
mikestankavich wants to merge 1 commit into
Open
fix(db): bump preview CNPG storage 10Gi→20Gi (reconcile live patch)#159mikestankavich wants to merge 1 commit into
mikestankavich wants to merge 1 commit into
Conversation
Preview's single data+WAL PVC filled. CNPG logged "no free disk space for WALs" and refused to start Postgres, so the backend went 0/1 Ready and app.preview.trakrf.id served HTTP 503 "no available server". Restored service by live-patching the Cluster CR + PVC to 20Gi. This commit reconciles git so ArgoCD stops showing drift (PVCs can't shrink, so this is the floor regardless). Adds a per-env dbCluster.storageSize override plumbed through the root app's inlineValues; unset on prod, which keeps the chart's 10Gi default. Preview fills from e2e/Playwright churn — 20Gi is headroom, not a fix; periodic prune + disk monitoring tracked separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Bumps the preview CNPG cluster (
trakrf-db-preview) storage from 10Gi → 20Gi via a new per-envdbCluster.storageSizeoverride. Prod is unchanged (no override → chart's 10Gi default).Why
On 2026-06-20 preview went offline (
app.preview.trakrf.id→ HTTP 503 "no available server").Root cause: preview's single data+WAL PVC was full (9.5G). CNPG logged
no free disk space for WALsand refused to start Postgres →trakrf-db-preview-1CrashLoopBackOff → backend0/1 Ready→ Traefik served 503. Not a build/deploy issue (CI was green).The fill is
tag_scans= 8.6 GB (91% of the DB) — the raw-read firehose from the ongoing geofence read test (uncompressed, 30-day retention, ~1.1 KB/row jsonb). Application tables total only ~30 MB. 20Gi is headroom, not the fix — the durable fix is Timescale compression + shorter retention ontag_scans, tracked in TRA-921 (likely 7-day retention + compress after 1–2 days).Fix sequence
ClusterCR to 20Gi; CNPG didn't propagate to the PVC, so also patched the PVCrequests.storagedirectly to trigger the CSI resize. Volume grew to 20Gi, Postgres started, cluster returned to "Cluster in healthy state", backend1/1, app root → HTTP 200.Notes
inlineValues. Verified withhelm template argocd/root --set cluster=gke: preview renderssize: "20Gi", prod renders no size key.Follow-up tickets
tag_scans/asset_scans(the actual fix; the geofence firehose will refill 20Gi otherwise).🤖 Generated with Claude Code