Conversation
This adds the unique spras_revision to every single paramater combination (before hashing) and the dataset label, to provide OSDF support on the level of deterministic algorithms.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
agitter
left a comment
There was a problem hiding this comment.
I finished another partial revision. I still haven't thought about the testing implications carefully.
| return f"v{importlib.metadata.version('spras').replace('.', '_')}" | ||
|
|
||
| def attach_spras_revision(label: str) -> str: | ||
| return f"{label}_{spras_revision()}" |
There was a problem hiding this comment.
I'm thinking through whether there are other ways to get this same behavior without making filenames longer. The subdirectory names that already follow the --params- pattern are long already, and now we're extending them. The only other idea is to use subdirectories instead, which isn't necessarily an improvement.
There was a problem hiding this comment.
This is a little concerning, though thanks to #434, I'm not too worried about files being the primary interface for organizing SPRAS output. We should still document this file directory naming once we have actual SPRAS workflow documentation.
whoops! accidentally feature-regressed
agitter
left a comment
There was a problem hiding this comment.
A few more comments. I still haven't looked through all the test code.
|
Since both past approaches do not scale well, I've decided to only focus on the RECORD file. This fails specifically in the case where SPRAS is somehow ran without being installed as a python module, and I can't think of a plausible scenario where this happens. |
This change means that output files will not be reused whenever SPRAS is updated, furthering the immutability goal necessary to get OSDF integration working for SPRAS benchmarking. ('updated' depends on the git commit hash or the actual SPRAS release version)
This adds the unique
spras_revisionto every single paramater combination (before hashing) and the dataset label, to provide OSDF support on the level of deterministic, non-seeded algorithms when datasets are immutable.This has the added benefit of allowing SPRAS users to simply upgrade their SPRAS version without needing to clear
output, which complements #380. The refactored test also partially covers #165 and #45. (This is also where the majority of the code comes from: The actual feature patch here is a 50 line change.)See #321 implemented by #335 for handling nondeterministic algorithms / seeded algorithms.
To make this change, a significant test refactor in
test/analysiswas needed to remove hardcoded paths (which contained the hashes being modified per-commit in this PR.) It turns out that whenever we make any change to the hash, this [original: the patch here fixes this] test breaks! That's why this PR is depended on by so many other PRs.