Skip to content

feat(backend): RESEND_API_KEY PreSync guard for deployed envs (TRA-972)#155

Merged
mikestankavich merged 3 commits into
mainfrom
feat/tra-972-email-key-presync-guard
Jun 10, 2026
Merged

feat(backend): RESEND_API_KEY PreSync guard for deployed envs (TRA-972)#155
mikestankavich merged 3 commits into
mainfrom
feat/tra-972-email-key-presync-guard

Conversation

@mikestankavich

Copy link
Copy Markdown
Contributor

What

Adds a PreSync guard that fails the ArgoCD sync loudly when RESEND_API_KEY is empty/missing in a deployed env, instead of letting transactional email (org invites, password resets) silently fail.

Why

TRA-972: email send is best-effort — the backend logs Resend [ERROR]: API key is invalid but still returns 201/200, so the UI shows success while nothing is delivered. An empty key is invisible until a user reports a missing email.

infra#154 (omit-when-empty + per-env ignoreDifferences) prevents one cause — ArgoCD clobbering the key to "". This PR detects the whole regression class: a missed out-of-band injection, a deleted key, or a botched cutover. It's the "cutover gate" in the ticket title.

How

  • helm/trakrf-backend/templates/email-guard-job.yaml — a pre-install,pre-upgrade hook Job (ArgoCD PreSync, same mapping as migrate-job; hook-weight: -10 so it runs first). It receives RESEND_API_KEY from the trakrf-backend Secret via secretKeyRef with optional: true (load-bearing — fix(backend): let operator-set RESEND_API_KEY survive ArgoCD sync #154 omits the key when empty, so a hard ref would die with an opaque CreateContainerConfigError; optional yields an empty env the check catches, and also catches a present-but-empty key), then [ -z "$RESEND_API_KEY" ] && exit 1 with a clear FATAL message.
    • Presence, not validity: a send-scoped re_... key can't be read-only validated (Resend /domains401 restricted_api_key) and a hook must not send a real email.
    • busybox pinned + multi-arch — GKE nodes are arm64 (T2A/Axion) and the backend Go image is shell-less. Hardened pod (runAsNonRoot, drop ALL, RO rootfs); tolerations passthrough so it schedules on the ARM-tainted pool.
  • emailGuard.{enabled,image} chart value, default disabled (local/eks/aks render nothing).
  • argocd/rootemailGuardEnabled per-env: preview=true now; prod=false with a flip-after-injection comment (an active guard would block live prod syncs until the key is present). Mirrors the mqttEnabled staging convention.

⚠️ Rollout order (critical)

preview + prod currently hold an empty key, so an active guard blocks their sync. Required order:

  1. Merge infra#154 (omit-when-empty + ignoreDifferences).
  2. Inject the real re_... key out-of-band into preview (then prod, on Mike's timing).
  3. Merge this PR (held for review → lands last → no deadlock).

Prod's flag stays false here; flip it only after the prod key is injected (apply-root-app.sh is cluster-wide + manual, so injecting first is the operator's responsibility).

Verification

  • helm template omits the Job when disabled (default / eks / aks); renders it correctly when enabled — PreSync hook annotations, optional: true secretRef, -z check, pinned busybox.
  • Guard shell logic exits 1 on empty/unset, 0 when set.
  • argocd/root (gke) gives only the preview backend app emailGuard.enabled: true; prod has none; eks/aks render clean.
  • helm lint + helm template clean on eks+aks (CI parity).

Design spec: docs/superpowers/specs/2026-06-10-tra-972-email-key-presync-guard-design.md

🤖 Generated with Claude Code

Mike Stankavich and others added 3 commits June 10, 2026 13:48
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Turns an empty/missing RESEND_API_KEY in a deployed env from a silent,
best-effort runtime warning (the backend logs Resend "API key is
invalid" but still returns 2xx, so the UI shows success and no email is
delivered) into a LOUD ArgoCD sync failure, surfaced before the broken
state goes live. This is the cutover gate named in TRA-972.

Belt-and-suspenders to infra#154 (omit-when-empty + ignoreDifferences):
#154 PREVENTS one cause (ArgoCD clobbering the key to ""); this DETECTS
the whole regression class — a missed out-of-band injection, a deleted
key, or a botched cutover.

- helm/trakrf-backend/templates/email-guard-job.yaml: a pre-install/
  pre-upgrade hook Job (ArgoCD PreSync, like migrate-job; weight -10 so
  it runs first). Reads RESEND_API_KEY from the trakrf-backend Secret via
  secretKeyRef optional:true (load-bearing: #154 OMITS the key when empty,
  so a hard ref would die with an opaque CreateContainerConfigError;
  optional yields an empty env the check catches, and also catches a
  present-but-empty key). Fails with a clear message + exit 1 when empty.
  Checks PRESENCE not validity (a send-scoped key can't be read-only
  validated; a hook must not send a real email). busybox pinned + multi-
  arch (GKE nodes are arm64; the backend Go image is shell-less).
  Hardened pod (runAsNonRoot, drop ALL, RO rootfs), tolerations
  passthrough so it schedules on the ARM-tainted pool.
- emailGuard.{enabled,image} chart value, default disabled.
- argocd/root: emailGuardEnabled per-env (preview=true now; prod=false
  with a flip-after-injection comment — an active guard would block live
  prod syncs until the key is present). Mirrors the mqttEnabled staging
  convention.

Rollout order (critical): merge infra#154 -> inject the real key into
preview (then prod, on Mike's timing) -> merge THIS PR. The guard PR is
held for review so it lands last; no deadlock.

Verified: helm template omits the Job when disabled (default/eks/aks)
and renders it correctly when enabled (PreSync hook, optional secretRef,
-z check, pinned busybox); guard shell logic exits 1 on empty/unset and
0 when set; argocd/root gives only the preview backend app
emailGuard.enabled:true; helm lint + template clean on eks+aks (CI
parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prod RESEND_API_KEY was injected out-of-band 2026-06-10 (Mike green-lit
deploy to both preview + prod), so the PreSync guard now passes in prod.
Flip prod emailGuardEnabled false->true so the cutover gate protects prod
as well as preview.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mikestankavich mikestankavich merged commit b01c920 into main Jun 10, 2026
19 checks passed
@mikestankavich mikestankavich deleted the feat/tra-972-email-key-presync-guard branch June 10, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant