Skip to content

feat: configurable ingest-time privacy controls for hook payloads (#148)#166

Open
linkvapeluckyman wants to merge 3 commits into
hoangsonww:masterfrom
linkvapeluckyman:feat/privacy-controls
Open

feat: configurable ingest-time privacy controls for hook payloads (#148)#166
linkvapeluckyman wants to merge 3 commits into
hoangsonww:masterfrom
linkvapeluckyman:feat/privacy-controls

Conversation

@linkvapeluckyman

Copy link
Copy Markdown
Contributor

Summary

Implements the MVP for #148 — a configurable privacy policy that redacts, hashes, or
drops sensitive data from hook payloads before they reach SQLite or WebSocket clients.

Issue #148 acceptance criteria

  • Hook events are transformed by the policy before being written to SQLite or sent to WebSocket clients (sanitization runs at every event-insert site in routes/hooks.js; summaries are sanitized before both persist and broadcast)
  • Users can create, edit, disable, and delete redaction rules from Settings without editing server code (new Privacy Controls panel)
  • Built-in detectors catch common token/private-key patterns in nested payloads and transcript-derived fields (APIError summaries/payloads from transcripts are sanitized too, with dedup comparing sanitized values)
  • Redacted events retain structured metadata — a _privacy counters block is stamped only when something was redacted; clean payloads are stored byte-identical, so dashboards, filters, analytics, and cost views are unaffected
  • A preview endpoint shows before/after transformations without persisting (POST /api/privacy/preview, also accepts a draft policy so the UI can preview unsaved edits)
  • Import/reimport behavior is explicit — policies apply to live ingest only; the Settings panel and docs state clearly that imported records are written as found on disk
  • Server tests cover nested objects, arrays, large payloads, hash stability, event dropping, broadcast/response output, and fail-safe behavior when a rule is invalid (14 tests; invalid regexes are rejected at save-time so runtime never sees them)

Built-in detectors (conservative default: secrets masked out of the box)

  • Secret-named keys (token / secret / password / api_key / auth / credential — same regex family as the Config Explorer redaction)
  • Bearer tokens inside strings
  • Common API-key formats (sk-ant-…, sk-…, ghp_…, github_pat_…, AKIA…, xox…, AIza…)
  • Private-key blocks (-----BEGIN … PRIVATE KEY-----)
  • Email addresses — opt-in (off by default)
  • Home-directory paths (/Users/…, /home/…, C:\Users\…) — opt-in (off by default)

Rule actions

  • mask[REDACTED:<rule>]
  • hash → stable truncated SHA-256 (sha256:abc…), so values stay correlatable across events
  • drop_field → key removed entirely (key-match rules only)
  • drop_event_payload → payload reduced to a metadata-only stub (covers the issue's preserve_metadata_only semantics)

Safety properties

  • Hook path stays fail-safe and non-blocking: a sanitizer crash degrades to a conservative metadata-only stub — never stores raw data, never fails ingestion
  • Routing decisions (transcript_path capture, status transitions, tool detection) run on the raw payload; only what leaves the ingest path is transformed
  • Additive schema only — one new app_settings key/value table; policy travels with GET /api/settings/export
  • Traversal guards: depth cap 32, node cap 20k, value regexes skip strings > 256 KB

Deferred (per issue non-goals / open questions)

  • Applying the policy to import/reimport paths (~20 scattered insert sites; the issue's explicit alternative — a UI warning — is implemented instead)
  • One-time historical rewrite job for pre-existing events (issue lists as separate workflow)
  • Redaction metadata as an Events-page filter/facet (open question in the issue)
  • README-CN / README-VN translations

Verification

  • npm run test:server — 275 pass, 0 fail (14 new privacy tests)
  • npm run test:client — 198 pass
  • npm run build — tsc + vite clean; Prettier clean
  • README (feature row + API table) and ARCHITECTURE (module graph, component table, ER diagram) updated

…ok payloads

Implements the MVP for issue hoangsonww#148:

- New lib/privacy.js sanitizer applied at every event-insert site in
  routes/hooks.js BEFORE persistence and WebSocket broadcast: six built-in
  detectors (secret-named keys, Bearer tokens, common API-key formats,
  private-key blocks, opt-in email addresses and home-directory paths) plus
  up to 100 custom key/value regex rules with mask / hash / drop_field /
  drop_event_payload actions
- Conservative default policy: obvious secrets masked out of the box;
  hash action uses stable truncated SHA-256 so values stay correlatable
- Redacted events carry a _privacy counters block (rules applied, masked /
  hashed / dropped) without exposing originals; clean payloads are stored
  byte-identical so analytics, filters, and cost views are unaffected
- Fail-safe: sanitizer errors degrade to a metadata-only stub (never raw
  data) and never fail hook ingestion; invalid rules are rejected at save
- REST API: GET/PUT /api/privacy + POST /api/privacy/preview (non-persisting
  before/after, supports draft policies); documented in OpenAPI/Swagger
- Policy persisted in a new additive app_settings key/value table and
  included in GET /api/settings/export
- Settings panel: master toggle, per-detector toggles, rule CRUD, live
  sample preview, explicit import/reimport warning; localized en/zh/vi
- 14 server tests: policy validation, nested payload masking, key formats,
  summary sanitization on ingest + response, hash stability, drop actions,
  disabled passthrough, opt-in detectors, large payloads, preview isolation

Closes hoangsonww#148

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces ingest-time privacy controls to redact, hash, or drop sensitive data from hook payloads and summaries before they are stored in SQLite or broadcast over WebSockets. It adds a new app_settings table, a server-side sanitizer with built-in and custom regex detectors, management and preview API endpoints, and a PrivacyControls settings UI. Feedback on these changes suggests slicing and sanitizing the prefix of oversized strings rather than skipping them entirely, compiling custom value rules with case-insensitivity (gi) to prevent under-redaction, and generating temporary client-side IDs for new rules to ensure stable React rendering keys.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread server/lib/privacy.js Outdated
Comment thread server/lib/privacy.js
Comment thread client/src/components/PrivacyControls.tsx
@hoangsonww hoangsonww added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Jun 10, 2026
- lib/privacy: oversized strings (beyond the scan cap) are now masked
  wholesale instead of skipped — a string must never bypass value scanning.
  The cap itself is raised to 2 MB, above the 1 MB express body limit, so
  every string a real hook payload can carry is fully scanned
- lib/privacy: custom value rules compile with gi so case-variants of a
  pattern (Password vs password) cannot slip past redaction
- client(PrivacyControls): new rules get a client-side id so unsaved rules
  have stable React keys across delete/reorder
- tests: new oversize-string masking case; drop_event_payload case now
  proves case-insensitive matching (15 privacy tests total)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@linkvapeluckyman linkvapeluckyman left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: address PR #166 review feedback

  • lib/privacy: oversized strings (beyond the scan cap) are now masked
    wholesale instead of skipped — a string must never bypass value scanning.
    The cap itself is raised to 2 MB, above the 1 MB express body limit, so
    every string a real hook payload can carry is fully scanned
  • lib/privacy: custom value rules compile with gi so case-variants of a
    pattern (Password vs password) cannot slip past redaction
  • client(PrivacyControls): new rules get a client-side id so unsaved rules
    have stable React keys across delete/reorder
  • tests: new oversize-string masking case; drop_event_payload case now
    proves case-insensitive matching (15 privacy tests total)

@hoangsonww hoangsonww linked an issue Jun 24, 2026 that may be closed by this pull request
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Projects

Development

Successfully merging this pull request may close these issues.

Feature: Configurable privacy controls for hook payload ingestion

3 participants