Skip to content

feat: add --staged diff scope for pre-commit hook scanning (#18)#53

Open
ChrisJr404 wants to merge 1 commit into
knostic:masterfrom
ChrisJr404:feat/diff-based-analysis-18
Open

feat: add --staged diff scope for pre-commit hook scanning (#18)#53
ChrisJr404 wants to merge 1 commit into
knostic:masterfrom
ChrisJr404:feat/diff-based-analysis-18

Conversation

@ChrisJr404
Copy link
Copy Markdown

What

Adds a --staged flag to openant scan and openant diff that runs the
incremental pipeline against the staged index (git diff --cached) instead
of a committed ref. This is the workflow the DiffGuard project from #18
is built around: scan what is about to be committed, before the commit lands.

The existing --diff-base / --pr / --incremental modes already cover
post-commit and PR-time scanning. --staged fills the pre-commit gap, which
makes the diff-mode pipeline usable from a pre-commit hook or a quick local
"scan what I'm about to commit" run without first committing.

Usage

# Pre-commit hook style: scan only staged changes against HEAD.
git add path/to/file.py
openant scan --staged --skip-dynamic-test

# Same thing via the shorter diff verb.
openant diff --staged

In a .git/hooks/pre-commit:

#!/bin/sh
exec openant diff --staged --skip-dynamic-test --output .openant/staged

How

  • New BuildStagedManifest, StagedChangedFiles, StagedHunksForFile in
    internal/git/diff.go. Same Manifest shape as the committed-diff path,
    with base_ref="STAGED", base_sha=HEAD, and a zero placeholder for
    head_sha (the index has no SHA).
  • The Python core reads these fields as opaque strings, so reports surface
    base=STAGED cleanly without any change in libs/openant-core.
  • diffOpts grows a staged bool; validate() makes --staged mutually
    exclusive with --diff-base / --pr.
  • modeOpts / modeDecision thread the staged choice through selectMode,
    so scan, diff, and the project meta.json running state agree on the
    scan kind.
  • README gets a short "Incremental and diff-based scans" section listing the
    four scopes (--diff-base, --pr, --staged, --incremental).

Notes

Untracked files are not in the index, so they do not appear in the staged
manifest. Pre-commit hooks typically git add first; not second-guessing
that contract here.

git diff --cached --unified=0 is the same hunk format the existing
committed-diff path consumes, so the rest of the pipeline (parser tagging,
diff_filter, reporter) needs no changes.

Tests

10 new tests, all passing:

  • internal/git: TestStagedChangedFilesAndHunks (stages two files plus a
    worktree-only edit, asserts --cached does not bleed),
    TestBuildStagedManifest, TestBuildStagedManifestChangedFilesScopeOmitsHunks,
    TestBuildStagedManifestEmptyIndex, TestBuildStagedManifestRejectsBadScope.
  • cmd: TestValidateModeFlags gains five rows for the new conflict combos
    (staged + diffBase, staged + pr, staged + incremental, full + staged,
    staged alone); TestSelectModeStagedSetsStagedFlag.
  • cmd/diff_shared_test.go: TestDiffOptsValidateStaged (7 sub-cases) and
    TestDiffOptsIsSetStaged.

go vet ./... clean. go test ./... green on all packages.

Closes #18.

Adds a --staged flag to `openant scan` and `openant diff` that runs the
incremental pipeline against the staged index (`git diff --cached`) instead
of a committed ref. This is the workflow described by the DiffGuard project
referenced in knostic#18: scan only what's about to be committed, before commit.

Implementation:

- New BuildStagedManifest, StagedChangedFiles, StagedHunksForFile in
  internal/git/diff.go. The manifest's base_ref is the literal "STAGED",
  base_sha is HEAD, and head_sha is a zero placeholder (the index has no
  SHA). The Python core reads these as opaque strings, so reports show
  "base=STAGED" without any core changes.
- diffOpts grows a staged bool; validate() makes --staged mutually
  exclusive with --diff-base/--pr.
- modeOpts/modeDecision thread the staged choice through selectMode so
  scan, diff, and the meta.json running state all agree.
- README gains a short "Incremental and diff-based scans" section listing
  the four scopes (--diff-base, --pr, --staged, --incremental).

Untracked files are not in the index, so they don't appear in the staged
manifest. Pre-commit hooks usually just `git add` first, which is the
standard expectation; we don't try to second-guess that.

Tests:

- 5 new git pkg tests: staged file/hunk extraction, manifest construction
  for changed_functions and changed_files scopes, empty-index edge case,
  bad-scope rejection. The hunk extraction test stages two files and
  edits a third worktree-only to confirm --cached doesn't bleed.
- 5 new cmd pkg tests: validateModeFlags coverage for the four new
  conflict combos, selectMode confirms the Staged decision flag.
- New diff_shared_test.go covers diffOpts.validate / isSet for the
  staged path.

Closes knostic#18.
if !quiet {
output.PrintKeyValue("Diff base", fmt.Sprintf("%s (%s)", m.BaseRef, shortSHA(m.BaseSHA)))
output.PrintKeyValue("Diff head", shortSHA(m.HeadSHA))
if m.BaseRef == git.StagedRef {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium — UX · Python report consumers don't mirror this STAGED special-case

Issue: Nice that the CLI output renders Diff head: index (staged) for staged manifests. But the same head_sha = 0000…0000 placeholder flows into libs/openant-core/generate_report.py:109-121 (_build_header_subtitle) and libs/openant-core/export_csv.py:54-65 (_format_diff_banner), both of which call _short_sha(head_sha)00000000. End-users will see Incremental {repo} · 5e1d2a4..00000000 · … in the HTML report and CSV banner.

Suggestion: Mirror this special-case on the Python side — a one-line guard if base_ref == "STAGED": head = "index" in both _build_header_subtitle and _format_diff_banner keeps the report symmetric with the CLI output.

if decision.Kind == config.ScanKindDiff {
manifestOpts.base = decision.Base
manifestOpts.scope = decision.Scope
manifestOpts.staged = decision.Staged
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium — Architecture · Staged is consumed here but never persisted to ScanMeta

Issue: decision.Staged flows into manifestOpts.staged for this run, but the meta written a few lines down at scan.go:300-311 only persists Kind=diff, Base="", Scope=... — the staged marker is lost. Two consequences:

  1. Crashed-resume silently degrades to full. If a staged scan writes meta.status=running and then crashes, a subsequent openant scan (no flags) hits the resume path at scan.go:272-276, which returns modeDecision{Kind: "diff", Base: "", Scope: ...} (Staged defaults to false). In runScan, prepareDiffManifest then sees an unset diffOpts, isSet() returns false, and the diff manifest is silently skipped — falling back to a full scan without telling the user.
  2. Audit trail is wrong. Terminal meta after a successful staged scan is indistinguishable from a malformed --diff-base run with empty base.

Suggestion: Either (a) add a Staged bool field (with a staged json tag, omitempty) to ScanMeta and propagate through both the writer and the resume path, or (b) skip persisting meta for staged runs entirely (since staged is meant to be ephemeral). (b) is simpler and matches the "pre-commit ephemeral" framing in the description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feat] diff based analysis

2 participants