Skip to content

Draft: Add pre-commit-hook#598

Draft
jpwgnr wants to merge 3 commits into
scikit-hep:mainfrom
jpwgnr:add-pre-commit-hook
Draft

Draft: Add pre-commit-hook#598
jpwgnr wants to merge 3 commits into
scikit-hep:mainfrom
jpwgnr:add-pre-commit-hook

Conversation

@jpwgnr

@jpwgnr jpwgnr commented Jun 23, 2026

Copy link
Copy Markdown

Summary

Add a first-party decaylanguage-validate command and pre-commit hook for validating EvtGen .dec files with DecFileParser.

The validator reports stable diagnostic codes so downstream experiments can choose their own policy, for example ignoring LHCb-style unmatched CDecay source warnings with --ignore=DLW004.

Changes

  • Add decaylanguage.dec.validate with a CLI entry point exposed as decaylanguage-validate.
  • Add .pre-commit-hooks.yaml so external repositories can use id: decaylanguage-validate.
  • Add the validator to this repository's pre-commit config for bundled/test .dec files, excluding intentional negative fixtures.
  • Add selectable diagnostics:
    • DLP001 parse errors
    • DLW001 duplicate Decay blocks
    • DLW002 missing CopyDecay source
    • DLW003 duplicate Decay/CDecay
    • DLW004 missing CDecay source
    • DLW005 self-conjugate CDecay
    • DLW999 unclassified parser warnings
  • Add compact pre-commit-style output with source pointers for parse errors, diagnostic summaries, and --max-diagnostics.
  • Allow validating individual files or directories recursively.
  • Document CLI usage, pre-commit configuration, ignore policy, and expected failure output.
  • Add focused validator tests.

Testing

uv run pytest tests/dec/test_validate.py -q
uv run ruff check src/decaylanguage/dec/validate.py tests/dec/test_validate.py
uv run ruff format --check src/decaylanguage/dec/validate.py tests/dec/test_validate.py
uv run --with mypy mypy src/decaylanguage/dec/validate.py
uv run pre-commit validate-config .pre-commit-config.yaml
uv run pre-commit validate-manifest .pre-commit-hooks.yaml
uv run pre-commit run decaylanguage-validate --all-files

Also tested on the local LHCb DecFiles checkout:

uv run decaylanguage-validate --ignore=DLW004 /Users/jpherdi/Documents/CERN_Software/DecFiles

This leaves DLP001, DLW001, and DLW003 diagnostics after ignoring the expected LHCb-style DLW004 findings.

@jpwgnr jpwgnr marked this pull request as draft June 23, 2026 14:47
@jpwgnr

jpwgnr commented Jun 23, 2026

Copy link
Copy Markdown
Author

Hi @eduardo-rodrigues,

as a follow up to our discussion yesterday at the Decfiles repo, here a codex-based draft version of the pre-commit-hook.

Before marking this as ready, I will test this for sure a bit more in depth.

But at least it already goes along the lines, I would like it to be eventually.
If you find the time, I would be happy if you could give it some review at a conceptual basis.

Details we can probably discuss once we align on a conceptual vision of the tool.
Thank you!

@eduardo-rodrigues

Copy link
Copy Markdown
Member

Hello. I am overwhelmed these days. It is a good problem to have ... ;-)

I will try and get back to you this week. I am also adding @henryiii to the review for all his expertise.

Thank you very much!

@henryiii

henryiii commented Jun 25, 2026

Copy link
Copy Markdown
Member

Here's a quick AI review from Opus 4.8, you can feed this back into codex (rubber duck style) if you wish. Thanks for letting me know what model you are using, so I know which model to review with. :)

From my fairly quick look at the moment, this looks a lot like flake8, which is a good thing. Idea seems good too.

(Needs a rebase, too, of course)

🤖 AI text below 🤖

I've completed a thorough review of PR #598. Here's my assessment.

Review: PR #598 — Add decaylanguage-validate command and pre-commit hook

Verdict: Solid, well-documented draft. The code is clean (mypy --strict clean, ruff clean, 9 tests pass). Architecture is sound — all five DLW* warnings are genuinely emitted during parse() (verified: _find_parsed_decays, _add_decays_to_be_copied, _add_charge_conjugate_decays all run in parse()), and they're correctly captured inside warnings.catch_warnings(record=True) so they don't trip the suite's filterwarnings = error. Main items below are about finishing the draft and reducing a fragile coupling.

Should fix before un-drafting

1. PR description ↔ code mismatch (the core "unfinished" item). The description says "Add the validator to this repository's pre-commit config for bundled/test .dec files." But the final commit (06fbce5 ci: remove local decay validator hook) removed it, and .pre-commit-config.yaml now has no validate entry. So the new hook never runs against decaylanguage's own bundled files (DECAY_LHCB.DEC, DECAY_BELLE2.DEC, the negative fixtures) — it's dogfooded nowhere in CI. Either re-wire it (with the appropriate --ignore/exclude for intentional negative fixtures) or update the description to match.

Robustness / maintainability

2. Fragile string-coupling with dec.py, partially untested. DLW001DLW005 re-encode the exact warning strings from dec.py as regexes, in two places each (the fullmatch classifier and the _compact_warning_message extractor), with no shared source of truth. If anyone rewords a warning, the code silently degrades to DLW999 — no test would catch it. And tests only cover DLW001, DLW003, DLP001; the DLW002/DLW004/DLW005 classification + compaction paths are entirely untested. Minimum: add a test exercising each rule. Better (larger): have dec.py emit a custom Warning subclass carrying the code, so validate.py reads an attribute instead of pattern-matching prose.

3. Latent iterator-exhaustion bug in validate_files. ignore is materialized (ignored = tuple(ignore)) but additional_decay_models is not, yet it's splatted per-file inside the loop (load_additional_decay_models(*additional_decay_models)). A generator argument would be exhausted after the first file. Harmless from the CLI (argparse gives a list), but inconsistent — tuple(additional_decay_models) once at the top fixes it.

Minor / nits

  • AUTHORS.rst — Jan Wagner isn't added (project convention in CLAUDE.md / CONTRIBUTING).
  • Nonexistent path → reported as DLP001 parse-error (a FileNotFoundError surfaces through the broad except Exception), rather than a clearer "file not found."
  • files: '\.(dec|DEC)$' in the hook only matches all-lower/all-upper, whereas the CLI dir-walk is case-insensitive (.suffix.lower()). Mixed-case .Dec would be skipped by the hook but caught by the CLI. Trivial.

Nice touches worth keeping

Correct grammar-load ordering for --additional-decay-model (called before parse()), os.path.relpath guarded against the Windows cross-drive ValueError, NO_COLOR/tty-aware coloring, proper exit codes via SystemExit(main()), and --ignore codes validated against known codes/prefixes.

Want me to post these as inline comments on the PR, or draft a review comment summarizing them?

@eduardo-rodrigues eduardo-rodrigues added the enhancement New feature or request label Jun 26, 2026
@eduardo-rodrigues

Copy link
Copy Markdown
Member

Hello @jpwgnr, @henryiii provided a great review hence I will wait for you to act on it before looking further (at details). Is that OK with you? Thank you and enjoy the weekend, both.

@jpwgnr

jpwgnr commented Jun 26, 2026

Copy link
Copy Markdown
Author

Yes, that is fine. I am on holidays for one and a half week, so not sure when I have the time to act on it. But once I am back, I will do it as one of the first things.

@jpwgnr

jpwgnr commented Jun 26, 2026

Copy link
Copy Markdown
Author

And thanks for the review by the way. It all sounds reasonable.

@eduardo-rodrigues

Copy link
Copy Markdown
Member

Super! My all means do enjoy your hols! No rush. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants