Skip to content

feat!: SPRAS revision#320

Open
tristan-f-r wants to merge 45 commits intoReed-CompBio:mainfrom
tristan-f-r:hash
Open

feat!: SPRAS revision#320
tristan-f-r wants to merge 45 commits intoReed-CompBio:mainfrom
tristan-f-r:hash

Conversation

@tristan-f-r
Copy link
Collaborator

@tristan-f-r tristan-f-r commented Jul 9, 2025

This change means that output files will not be reused whenever SPRAS is updated, furthering the immutability goal necessary to get OSDF integration working for SPRAS benchmarking. ('updated' depends on the git commit hash or the actual SPRAS release version)

This adds the unique spras_revision to every single paramater combination (before hashing) and the dataset label, to provide OSDF support on the level of deterministic, non-seeded algorithms when datasets are immutable.

This has the added benefit of allowing SPRAS users to simply upgrade their SPRAS version without needing to clear output, which complements #380. The refactored test also partially covers #165 and #45. (This is also where the majority of the code comes from: The actual feature patch here is a 50 line change.)

See #321 implemented by #335 for handling nondeterministic algorithms / seeded algorithms.


To make this change, a significant test refactor in test/analysis was needed to remove hardcoded paths (which contained the hashes being modified per-commit in this PR.) It turns out that whenever we make any change to the hash, this [original: the patch here fixes this] test breaks! That's why this PR is depended on by so many other PRs.

This adds the unique spras_revision to every single paramater combination (before hashing) and the dataset label, to provide OSDF support on the level of deterministic algorithms.
@tristan-f-r tristan-f-r marked this pull request as ready for review July 9, 2025 20:51
@tristan-f-r tristan-f-r added enhancement New feature or request needed for benchmarking Priority PRs needed for the benchmarking paper labels Jul 9, 2025
@tristan-f-r tristan-f-r changed the title feat: spras_revision feat: SPRAS revision Jul 9, 2025
@tristan-f-r

This comment was marked as outdated.

@tristan-f-r tristan-f-r marked this pull request as draft July 9, 2025 21:37
@tristan-f-r tristan-f-r marked this pull request as ready for review July 10, 2025 19:34
@tristan-f-r tristan-f-r changed the title feat: SPRAS revision feat!: SPRAS revision Jul 10, 2025
@tristan-f-r

This comment was marked as outdated.

@tristan-f-r tristan-f-r added the P-high This is a blocker for many PRs/issues/features label Jul 24, 2025
@github-actions github-actions bot added the merge-conflict This PR has merge conflicts. label Jan 9, 2026
@tristan-f-r tristan-f-r added the awaiting-author Author of the PR needs to fix something from a review / etc. label Jan 10, 2026
@github-actions github-actions bot removed the merge-conflict This PR has merge conflicts. label Jan 10, 2026
@tristan-f-r tristan-f-r added P-high This is a blocker for many PRs/issues/features and removed awaiting-author Author of the PR needs to fix something from a review / etc. P-medium medium prirotity; this is needed for some external service or another PR labels Jan 10, 2026
@tristan-f-r tristan-f-r added the tuning Workflow-spanning algorithm tuning label Jan 13, 2026
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished another partial revision. I still haven't thought about the testing implications carefully.

return f"v{importlib.metadata.version('spras').replace('.', '_')}"

def attach_spras_revision(label: str) -> str:
return f"{label}_{spras_revision()}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking through whether there are other ways to get this same behavior without making filenames longer. The subdirectory names that already follow the --params- pattern are long already, and now we're extending them. The only other idea is to use subdirectories instead, which isn't necessarily an improvement.

Copy link
Collaborator Author

@tristan-f-r tristan-f-r Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little concerning, though thanks to #434, I'm not too worried about files being the primary interface for organizing SPRAS output. We should still document this file directory naming once we have actual SPRAS workflow documentation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unresolving this for now so we can get broader feedback from @annaritz and @ntalluri. I can be a meeting agenda item if needed.

@github-actions github-actions bot added the merge-conflict This PR has merge conflicts. label Jan 31, 2026
@github-actions github-actions bot removed the merge-conflict This PR has merge conflicts. label Jan 31, 2026
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments. I still haven't looked through all the test code.

@tristan-f-r
Copy link
Collaborator Author

tristan-f-r commented Jan 31, 2026

Since both past approaches do not scale well, I've decided to only focus on the RECORD file.

This fails specifically in the case where SPRAS is somehow ran without being installed as a python module, and I can't think of a plausible scenario where this happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needed for benchmarking Priority PRs needed for the benchmarking paper P-high This is a blocker for many PRs/issues/features tuning Workflow-spanning algorithm tuning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants