fix(tra-974): pin preview backend LOG_LEVEL=info to stop debug log firehose#157
Merged
Merged
Conversation
…rehose Preview detects as EnvDev, so the Go backend logger defaults to debug (logger/config.go) and emits a per-message "ingest message processed" line ~3/sec, 24/7 — a Cloud Logging ingest cost with no debugging value at steady state, and it drowns other sessions troubleshooting on preview. Root cause found while implementing: the chart's `config.logLevel` only ever emitted BACKEND_LOG_LEVEL, which the current backend does NOT read — proven live (preview's ConfigMap already had BACKEND_LOG_LEVEL=info yet the pod logged debug; an explicit LOG_LEVEL=info env is what silenced it). So a naive `config.logLevel` pin would have been a no-op. Fix (preview-only, prod untouched): - chart: new optional `config.runtimeLogLevel` that emits a real LOG_LEVEL ConfigMap key ONLY when set. Empty default → envs that omit it (prod) render no LOG_LEVEL key and keep their APP_ENV→warn default, byte-identical to today (no prod ConfigMap change, no prod pod bounce). - root app: per-env `logLevel: info` for preview, appended to the config block via the existing no-env-conditional values idiom (like jwtExpirationSeconds). The legacy/dead BACKEND_LOG_LEVEL key is left in place and flagged for a separate cleanup (TRA-974). The backend-side env-detection fix (preview gets its own env defaulting to info) is the platform-repo option (b), out of scope. Verified via helm template: default render emits no LOG_LEVEL; preview app inlineValues carry runtimeLogLevel=info → chart emits LOG_LEVEL="info"; prod emits neither. NOTE: root-template change — requires `scripts/apply-root-app.sh gke` post-merge to materialize (root/templates/* don't auto-sync). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Preview detects as EnvDev, so the Go backend logger defaults to debug (
logger/config.go) and emits a per-messageingest message processedline ~3/sec, 24/7. That's a GCP Cloud Logging ingest cost with no steady-state value, and it drowns other sessions troubleshooting on preview. Decision: preview should be quiet by default, escalate to debug only during an active debugging window.Root cause (found while implementing — would have been a no-op otherwise)
The chart's
config.logLevelonly ever emittedBACKEND_LOG_LEVEL, which the current backend does not read. Proven against the live cluster: preview's ConfigMap already hadBACKEND_LOG_LEVEL=info, yet the pod logged at debug — and an explicitLOG_LEVEL=infoenv (the stopgapkubectl set env) is what actually silenced it. So pinningconfig.logLevelwould have shipped a silent no-op.Fix — preview-only, prod untouched
helm/trakrf-backend): new optionalconfig.runtimeLogLevelthat emits a realLOG_LEVELConfigMap key only when set. Empty default → envs that omit it (prod) render noLOG_LEVELkey and keep theirAPP_ENV→warn default — ConfigMap byte-identical to today, no prod pod bounce.argocd/root): per-envlogLevel: infofor preview, appended to the config block via the template's existing no-env-conditional values idiom (same pattern asjwtExpirationSeconds).The legacy/dead
BACKEND_LOG_LEVELkey is left in place and flagged for a separate cleanup. The backend-side detection fix (preview gets its own env defaulting to info) is the platform-repo option (b), out of scope here.Verification (
helm template)Deploy note
Root-template change → does not auto-sync. Requires
scripts/apply-root-app.sh gkepost-merge to materialize (re-renders all env child apps; only preview's ConfigMap actually changes, so only the preview backend pod bounces). The interimkubectl set env LOG_LEVEL=infooverride on preview keeps it quiet until then.Tracking: TRA-974.
🤖 Generated with Claude Code