Skip to content

feat: add --fresh flag to parse for forced reparse#38

Open
joshbouncesecurity wants to merge 4 commits into
knostic:release/2026-05-14from
joshbouncesecurity:feat/issue16-19-parse-fresh
Open

feat: add --fresh flag to parse for forced reparse#38
joshbouncesecurity wants to merge 4 commits into
knostic:release/2026-05-14from
joshbouncesecurity:feat/issue16-19-parse-fresh

Conversation

@joshbouncesecurity
Copy link
Copy Markdown
Contributor

Summary

Adds --fresh to openant parse to force a full reparse from scratch without manually deleting dataset.json. Useful when parser improvements are deployed and the existing dataset needs to be regenerated.

The JS parser also now logs a hint pointing at --fresh when existing units are reused, so users discover the flag when they need it.

Addresses item 19 from #16 (does not close the issue).

Test plan

  • openant parse <repo> reuses existing units when dataset.json already exists (default).
  • openant parse <repo> --fresh deletes the cached dataset and reparses from scratch.
  • After running with --fresh, the JS parser hint about reused units no longer fires for that run.

@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

Manual verification

  • openant parse <repo> twice in a row: second run is fast (cached dataset.json reused).
  • openant parse <repo> --fresh: deletes existing dataset.json, runs full reparse (longer).
  • openant parse <missing-repo> --fresh: doesn't crash when dataset.json doesn't pre-exist (no-op).
  • openant parse --help: --fresh listed; help text mentions "only deletes dataset.json; other artifacts in the output dir are preserved".
  • Race: two openant parse --fresh simultaneously on the same repo: no FileNotFoundError from racing os.remove (catch added).

@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

Local test results

Built the Go CLI from this branch and exercised --fresh end-to-end on Windows using libs/openant-core/tests/fixtures/sample_python_repo.

Commands run:

go build -o openant.exe ./
./openant.exe parse <fixture> --output _out          # run 1: fresh output dir
./openant.exe parse <fixture> --output _out --fresh  # run 2: --fresh on existing dataset
./openant.exe parse <fixture> --output _empty --fresh  # run 3: --fresh with no pre-existing dataset

Outcome:

  • --fresh listed in parse --help with description "Delete existing dataset.json and reparse from scratch (other artifacts preserved)" ✅
  • Run with --fresh against existing dataset prints [Parser] --fresh: deleted existing dataset.json and rebuilds (mtime advanced) ✅
  • Run with --fresh and no pre-existing dataset.json: no crash, no "deleted" message, parse runs cleanly ✅
  • Race-condition catch and JS-parser hint: not exercised in this manual pass (covered by automated tests in the diff).

@ar7casper ar7casper changed the base branch from master to release/2026-05-14 May 14, 2026 12:59
@ar7casper
Copy link
Copy Markdown
Collaborator

Hey @joshbouncesecurity — running through your draft batch, this one's the easiest to land. Sharing a quick review now since it looks essentially mergeable:

Things done quietly well:

  • TOCTOU-safe deletion in parser_adapter.py:108-116try/except FileNotFoundError instead of if exists() then remove(), with a comment explaining the race. Easy to miss; you didn't.
  • Test pyramid is right: Python (149 LOC, 5 cases) covers the actual deletion semantics with a stubbed parser; Go (114 LOC, 5 cases) covers flag registration + arg pass-through. No overlap, no gaps.
  • Refactor commit eb8ce74 extracts buildParsePyArgs specifically so the Go side becomes testable without invoking the Python runtime — good engineering.
  • Help text accurately scopes the deletion: "Delete existing dataset.json … other artifacts preserved." Important — --fresh reads broader than what it does.
  • Discoverability via JS hint: surfaces the flag exactly when duplicates are skipped.
  • Order is correct: --fresh deletion runs before _maybe_apply_diff_filter, so --fresh + --diff-manifest composes cleanly.

Three small things before flipping out of draft:

  1. Wrong issue reference in commit subject 819ad20 — the message reads feat: add --fresh flag to parse command (#21). #21 is the Express anonymous handler bug. This PR addresses item 19 from Fork contributions: bug fixes, Windows compatibility, pipeline resilience, and new features #16. Worth a git commit --amend or fixup before the squash so it doesn't end up wrong in history forever.

  2. JS hint references a flag the JS parser doesn't haveunit_generator.js:420:

    console.error(`  Note: ${duplicateCount} existing units kept as-is (use --fresh to regenerate all units)`);

    unit_generator.js is also runnable standalone; --fresh only exists on the openant parse wrapper. Tighten to:

    console.error(`  Note: ${duplicateCount} existing units kept as-is (use 'openant parse --fresh' to regenerate all units)`);
  3. --fresh + --diff-manifest interaction not covered by tests. The composition is mechanically correct (verified by reading the order), but locking it down with one extra Python test would close the gap.

Nit (optional): cli.py:124 uses fresh=getattr(args, "fresh", False). argparse guarantees args.fresh exists when the flag has a default=, so args.fresh would work. Matches the surrounding style (name, diff_manifest are the same), so probably right to leave consistent.

Address #1-#3 and this is good to flip out of draft. Nice work.

@joshbouncesecurity
Copy link
Copy Markdown
Contributor Author

Thanks @ar7casper, addressed in 56ce18b:

  1. Wrong issue ref in 819ad20 — agreed it's wrong, but since this will be squash-merged the PR title (which is correct) is what'll land in history. Happy to fixup the commit if you'd prefer it clean regardless.

  2. JS hint — fixed to 'openant parse --fresh'.

  3. --fresh + --diff-manifest test — added test_fresh_and_diff_manifest_compose_correctly.

Also fixed a regression I'd introduced relative to upstream: the branch had reverted the --level default back to "all" (diverged from the fix in #35). Restored to "reachable" and merged the level tests from knostic/master into parse_test.go alongside the new --fresh tests.

joshbouncesecurity and others added 4 commits May 14, 2026 16:26
The parse step's unit generator merges new units into an existing
dataset.json, preserving old units as-is. This means changes to the
parser (e.g., improved call graph resolution) don't take effect for
previously-parsed units unless the dataset is deleted manually.

Add --fresh flag to parse (and ensure scan --fresh also clears the
dataset) so users can force a full reparse when needed.

- Go CLI: add --fresh flag to parse command, pass through to Python
- Python CLI: add --fresh arg to parse subparser
- parser_adapter: delete existing dataset.json when fresh=True
- scanner: include dataset.json in fresh cleanup alongside checkpoints
- unit_generator: add stderr note when existing units are reused

Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
- Extract buildParsePyArgs from runParse so the helper is the source of
  truth (tests no longer keep a parallel copy with 'keep in sync')
- Replace exists()+remove() with try/except FileNotFoundError to avoid
  TOCTOU race when two --fresh parses run concurrently
- Clarify --fresh help text and docstring: only dataset.json is deleted;
  other artifacts in the output dir are preserved
- Restore --level default to "reachable" and guard to level != "reachable"
  (reverts regression introduced when branch diverged from upstream fix #35)
- Restore upstream parse_test.go level tests, updated to 7-arg signature
- Fix JS duplicate-units hint to say 'openant parse --fresh' not --fresh
- Add test: --fresh + --diff-manifest compose correctly

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@joshbouncesecurity joshbouncesecurity force-pushed the feat/issue16-19-parse-fresh branch from 56ce18b to 7ece34f Compare May 14, 2026 13:44
@joshbouncesecurity joshbouncesecurity marked this pull request as ready for review May 14, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants