Conversation
This adds the unique spras_revision to every single paramater combination (before hashing) and the dataset label, to provide OSDF support on the level of deterministic algorithms.
we also make it lazy
Documentation build overview
Show files changed (4 files in total): 📝 4 modified | ➕ 0 added | ➖ 0 deleted
|
|
I was going to ask @ntalluri about this, since I wasn't quite sure if we will have expensive graph heuristics or not.
I did decouple this from |
|
There could be more than one way to design this sensibly. One would be that if heuristics are enabled in the config file, that automatically generates the graph summary table. The produces more output than requested, which is slightly undesirable. Another could be to move the heuristic calculations inside each --parameters> subdirectory, which may be where you are headed. If that is written as a file for that one pathway, it could be consumed for heuristics (or used for heuristics and then written to disk). Later, if the graph summary table is requested, it would grab the precomputed statistics from those files in the subdirectories. |
|
I'll mark this as a draft for now and design something in line with your second proposal. |
whoops! accidentally feature-regressed
|
Would you be able to explain what the goal and what the changes are of this PR in the top comment? Also why does this depend on SPRAS revision? |
|
I've edited the top comment to mention the heuristics PR 👍, though the motivation was already present. As mentioned in the meeting and in the top-level comment, this depends on the integration testing part of the SPRAS revision and not the immutability section. |
We also make graph statistics lazy. Laziness isn't used in
summary.py, but I assume that we'll have more computationally expensive graph statistics as SPRAS develops, especially when it can take long to compute for our larger graphs, so this also splits up statistic generation into different rules.Most importantly, this allows us to re-use statistics by consuming specific statistics as input files, which is currently used in #431.