feat(backend): RESEND_API_KEY PreSync guard for deployed envs (TRA-972)#155
Merged
Merged
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Turns an empty/missing RESEND_API_KEY in a deployed env from a silent, best-effort runtime warning (the backend logs Resend "API key is invalid" but still returns 2xx, so the UI shows success and no email is delivered) into a LOUD ArgoCD sync failure, surfaced before the broken state goes live. This is the cutover gate named in TRA-972. Belt-and-suspenders to infra#154 (omit-when-empty + ignoreDifferences): #154 PREVENTS one cause (ArgoCD clobbering the key to ""); this DETECTS the whole regression class — a missed out-of-band injection, a deleted key, or a botched cutover. - helm/trakrf-backend/templates/email-guard-job.yaml: a pre-install/ pre-upgrade hook Job (ArgoCD PreSync, like migrate-job; weight -10 so it runs first). Reads RESEND_API_KEY from the trakrf-backend Secret via secretKeyRef optional:true (load-bearing: #154 OMITS the key when empty, so a hard ref would die with an opaque CreateContainerConfigError; optional yields an empty env the check catches, and also catches a present-but-empty key). Fails with a clear message + exit 1 when empty. Checks PRESENCE not validity (a send-scoped key can't be read-only validated; a hook must not send a real email). busybox pinned + multi- arch (GKE nodes are arm64; the backend Go image is shell-less). Hardened pod (runAsNonRoot, drop ALL, RO rootfs), tolerations passthrough so it schedules on the ARM-tainted pool. - emailGuard.{enabled,image} chart value, default disabled. - argocd/root: emailGuardEnabled per-env (preview=true now; prod=false with a flip-after-injection comment — an active guard would block live prod syncs until the key is present). Mirrors the mqttEnabled staging convention. Rollout order (critical): merge infra#154 -> inject the real key into preview (then prod, on Mike's timing) -> merge THIS PR. The guard PR is held for review so it lands last; no deadlock. Verified: helm template omits the Job when disabled (default/eks/aks) and renders it correctly when enabled (PreSync hook, optional secretRef, -z check, pinned busybox); guard shell logic exits 1 on empty/unset and 0 when set; argocd/root gives only the preview backend app emailGuard.enabled:true; helm lint + template clean on eks+aks (CI parity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prod RESEND_API_KEY was injected out-of-band 2026-06-10 (Mike green-lit deploy to both preview + prod), so the PreSync guard now passes in prod. Flip prod emailGuardEnabled false->true so the cutover gate protects prod as well as preview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a PreSync guard that fails the ArgoCD sync loudly when
RESEND_API_KEYis empty/missing in a deployed env, instead of letting transactional email (org invites, password resets) silently fail.Why
TRA-972: email send is best-effort — the backend logs Resend
[ERROR]: API key is invalidbut still returns201/200, so the UI shows success while nothing is delivered. An empty key is invisible until a user reports a missing email.infra#154 (omit-when-empty + per-env
ignoreDifferences) prevents one cause — ArgoCD clobbering the key to"". This PR detects the whole regression class: a missed out-of-band injection, a deleted key, or a botched cutover. It's the "cutover gate" in the ticket title.How
helm/trakrf-backend/templates/email-guard-job.yaml— apre-install,pre-upgradehook Job (ArgoCD PreSync, same mapping asmigrate-job;hook-weight: -10so it runs first). It receivesRESEND_API_KEYfrom thetrakrf-backendSecret viasecretKeyRefwithoptional: true(load-bearing — fix(backend): let operator-set RESEND_API_KEY survive ArgoCD sync #154 omits the key when empty, so a hard ref would die with an opaqueCreateContainerConfigError;optionalyields an empty env the check catches, and also catches a present-but-empty key), then[ -z "$RESEND_API_KEY" ] && exit 1with a clear FATAL message.re_...key can't be read-only validated (Resend/domains→401 restricted_api_key) and a hook must not send a real email.busyboxpinned + multi-arch — GKE nodes are arm64 (T2A/Axion) and the backend Go image is shell-less. Hardened pod (runAsNonRoot, drop ALL, RO rootfs);tolerationspassthrough so it schedules on the ARM-tainted pool.emailGuard.{enabled,image}chart value, default disabled (local/eks/aks render nothing).argocd/root—emailGuardEnabledper-env: preview=true now; prod=false with a flip-after-injection comment (an active guard would block live prod syncs until the key is present). Mirrors themqttEnabledstaging convention.preview + prod currently hold an empty key, so an active guard blocks their sync. Required order:
ignoreDifferences).re_...key out-of-band into preview (then prod, on Mike's timing).Prod's flag stays
falsehere; flip it only after the prod key is injected (apply-root-app.shis cluster-wide + manual, so injecting first is the operator's responsibility).Verification
helm templateomits the Job when disabled (default / eks / aks); renders it correctly when enabled — PreSync hook annotations,optional: truesecretRef,-zcheck, pinned busybox.1on empty/unset,0when set.argocd/root(gke) gives only the preview backend appemailGuard.enabled: true; prod has none; eks/aks render clean.helm lint+helm templateclean on eks+aks (CI parity).Design spec:
docs/superpowers/specs/2026-06-10-tra-972-email-key-presync-guard-design.md🤖 Generated with Claude Code