Skip to content

[Proposal] Workflow template engine: Mustache lambda helpers for missing-key defaults #5682

@hakatt

Description

@hakatt

Workflow template engine: missing-key handling for custom/enriched fields

Summary

Workflow templates that reference custom or enriched alert fields (fields not guaranteed by the upstream provider schema) currently crash with RenderException because render_context() always renders with safe=True. There is no way for a workflow author to express a default value for a field that may or may not be present.

Background: two distinct categories of missing fields

Alert fields in Keep come from two different sources, and they need to be handled differently:

1. Provider schema fields

Fields that are part of the upstream service's alert schema — e.g. Grafana's panelUrl, dashboardUrl, valueString; Prometheus's instance, job, value. These fields:

  • Have a known, documented shape from the upstream provider
  • Should always be present if the provider's _format_alert() is correct
  • When absent, it indicates a gap in the provider implementation or a schema change upstream

Correct handling: the provider is responsible. Each provider should explicitly default any field that may be absent in _format_alert(), and the absence should be logged so that provider drift from upstream is visible and can be fixed. This is already the pattern in the codebase and is the right place to enforce schema contracts.

2. Custom and enriched fields

Fields that a workflow author adds via enrich_alert, custom labels, calculated values, or integration-specific enrichments. Examples: slack_timestamp (stored after first Slack post), tenant-specific label mappings, fields from a secondary lookup step. These fields:

  • Are not part of any provider schema
  • May or may not be present depending on alert history and workflow state
  • Are inherently optional from the workflow author's perspective

No provider can default these — they are outside the scope of any upstream service. The workflow template itself is the only appropriate place to handle their optionality.

The problem

render_context() in iohandler.py hardcodes safe=True for all string with: parameters. When Chevron encounters a missing key under safe=True, it warns to stderr and _render() converts that warning into a RenderException, aborting the action.

There is currently no supported syntax in workflow YAML for a workflow author to say: "render this field, but if it's missing use a default value". The keep.* function system cannot help here because it is evaluated after Mustache rendering — by the time keep.default(...) would run, the RenderException has already been thrown.

Proposed solution

A two-part approach that keeps the two categories cleanly separated:

Part A — Provider-level defaulting (already the right pattern, no change needed)

Providers continue to own and default their schema fields in _format_alert(). If a provider emits an alert without a field that is part of its upstream schema, the provider implementation should be fixed. This keeps schema contracts explicit and auditable.

Part B — Mustache section helpers in workflow context (new)

Inject a set of named lambda helpers into the Chevron render context for every workflow render. These use the Mustache lambda spec — a callable used as a section tag renders the inner content and can transform or default the result:

# Empty or missing → "N/A"
message: "Panel: {{#fn.na}}{{ alert.panelUrl }}{{/fn.na}}"

# Empty or missing → ""
message: "Thread: {{#fn.default}}{{ alert.slack_timestamp }}{{/fn.default}}"

# Transform
message: "Severity: {{#fn.upper}}{{ alert.severity }}{{/fn.upper}}"

Proposed initial helper set injected globally into render context:

WORKFLOW_HELPERS = {
    "fn": {
        "default": lambda text, render: render(text) or "",
        "na":      lambda text, render: render(text) or "N/A",
        "upper":   lambda text, render: render(text).upper(),
        "lower":   lambda text, render: render(text).lower(),
        "strip":   lambda text, render: render(text).strip(),
    }
}

These lambdas receive the rendered inner text, so they operate after field resolution. An absent key renders to "" inside the lambda (Chevron does not throw when safe=False is used, which the lambda call path uses), and the lambda applies the default.

Required changes

Backend (keep/iohandler/iohandler.py):

  • Define WORKFLOW_HELPERS dict with the lambda set
  • Merge it into the context passed to chevron.render() in render_context() and render_recursively()

Frontend (keep-ui/entities/workflows/lib/mustache.ts):

  • extractMustacheVariables() currently validates every token matched by MUSTACHE_REGEX against ALLOWED_MUSTACHE_VARIABLE_REGEX = /^[a-zA-Z0-9._-\s]+$/
  • Section open {{#fn.na}}, section close {{/fn.na}}, and inverted {{^...}} tokens start with #, /, ^ and fail this regex, producing spurious "invalid variable" warnings in the builder
  • Fix: filter out tokens that start with #, /, ^, !, > before applying the variable name validation — these are Mustache sigils, not variable references

What this does not change

  • Existing workflows with no fn.* references are unaffected
  • Provider _format_alert() defaulting remains the correct place for schema fields
  • The keep.* function preprocessor is unaffected
  • No changes to deduplication, fingerprinting, or alert storage

Alternatives considered

Approach Why not chosen
Global safe=False Silently swallows template typos — {{ alert.naem }} renders "" with no feedback
Per-workflow safe: false flag Authors can opt out of safety with no defaulting syntax; schema change required
Pre-render context patch (scan for missing keys, inject "") Two-pass render complexity; still no per-field default control
Jinja2 migration | default("N/A") filter is the right UX, but a breaking migration for all existing workflows

Status

  • Chevron lambda behaviour verified locally (flat and nested dict helpers, warn=False path)
  • Backend injection implemented in keep/iohandler/iohandler.pyWORKFLOW_HELPERS merged into Chevron context; safe=False auto-enabled when {{#fn. detected
  • Frontend mustache.ts fix implemented — sigil tokens filtered in extractMustacheVariables()
  • Workflow example: {{#fn.default}}{{ alert.silenceURL }}{{/fn.default}} deployed on dev (Neo Grafana → Slack workflow, revision 4)
  • Validated end-to-end on dev environment (all five helpers: fn.default, fn.na, fn.upper, fn.lower, fn.strip — exercised by real alert firing on dev Keep 0.49.0-svt.10)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions