-
Notifications
You must be signed in to change notification settings - Fork 6
Added replicability check #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
blzserdos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see some comments below.
|
Hi, I updated the example following the revision. Let me know what you think! |
blzserdos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks!
MarieRoald
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good, just a couple of small details to consider :)
| models[rank] = {} | ||
| split_indices[rank] = {} | ||
|
|
||
| for split_no, (train_index, _) in enumerate(rskf.split(dataset)): | ||
| repeat_no = split_no // splits | ||
|
|
||
| # Sort rows for consistent ordering (not necessary) | ||
|
|
||
| sorted_train_index = sorted(train_index) | ||
| train = dataset[sorted_train_index] | ||
|
|
||
| train = train / tl.norm(train) # Pre-process the tensor without leaking info from other folds | ||
|
|
||
| current_models = fit_many_parafac(train.data, rank) | ||
| current_model = tlviz.multimodel_evaluation.get_model_with_lowest_error(current_models, train) | ||
|
|
||
| if repeat_no not in models[rank].keys(): | ||
| models[rank][repeat_no] = [] | ||
|
|
||
| models[rank][repeat_no].append(current_model) | ||
|
|
||
| if repeat_no not in split_indices[rank].keys(): | ||
| split_indices[rank][repeat_no] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| models[rank] = {} | |
| split_indices[rank] = {} | |
| for split_no, (train_index, _) in enumerate(rskf.split(dataset)): | |
| repeat_no = split_no // splits | |
| # Sort rows for consistent ordering (not necessary) | |
| sorted_train_index = sorted(train_index) | |
| train = dataset[sorted_train_index] | |
| train = train / tl.norm(train) # Pre-process the tensor without leaking info from other folds | |
| current_models = fit_many_parafac(train.data, rank) | |
| current_model = tlviz.multimodel_evaluation.get_model_with_lowest_error(current_models, train) | |
| if repeat_no not in models[rank].keys(): | |
| models[rank][repeat_no] = [] | |
| models[rank][repeat_no].append(current_model) | |
| if repeat_no not in split_indices[rank].keys(): | |
| split_indices[rank][repeat_no] = [] | |
| models[rank] = [[] for _ in range(repeats)] | |
| split_indices[rank] = [[] for _ in range(repeats)] | |
| for split_no, (train_index, _) in enumerate(rskf.split(dataset)): | |
| repeat_no = split_no // splits | |
| # Sort rows for consistent ordering (not necessary) | |
| sorted_train_index = sorted(train_index) | |
| train = dataset[sorted_train_index] | |
| train = train / tl.norm(train) # Pre-process the tensor without leaking info from other folds | |
| current_models = fit_many_parafac(train.data, rank) | |
| current_model = tlviz.multimodel_evaluation.get_model_with_lowest_error(current_models, train) | |
| models[rank][repeat_no].append(current_model) |
Since the keys here are contiguous integers starting at zero, I think it makes more sense to keep this as a list. By pre-allocating it you can also avoid the initialisation logic. Unless there's a specific reason to keep it as a dictionary of course.
| # Here, we are skipping the mode we split (``mode=0``). | ||
|
|
||
| replicability_stability = {} | ||
| for rank in [2, 3, 4, 5]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for rank in [2, 3, 4, 5]: | |
| for rank in models.keys(): |
Since you have this information defined in our models dictionary, you don't need to repeat them here.
| for repeat_no in models[rank].keys(): | ||
| for i, cp_i in enumerate(models[rank][repeat_no]): | ||
| for j, cp_j in enumerate(models[rank][repeat_no]): | ||
| if i < j: # include every pair only once and omit i == j |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if i < j: # include every pair only once and omit i == j | |
| if i >= j: # include every pair only once and omit i == j | |
| continue |
If you flip the logic and use a continue statement, then you can un-indent the rest of the code a bit (which I at least find a bit easier to follow).
| for repeat_no in models[rank].keys(): | ||
| for i, cp_i in enumerate(models[rank][repeat_no]): | ||
| for j, cp_j in enumerate(models[rank][repeat_no]): | ||
| if i < j: # include every pair only once and omit i == j |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if i < j: # include every pair only once and omit i == j | |
| if i >= j: # include every pair only once and omit i == j | |
| continue |
Same as above
| # present in both subsets. | ||
|
|
||
| replicability_stability_alt = {} | ||
| for rank in [2, 3, 4, 5]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for rank in [2, 3, 4, 5]: | |
| for rank in models.keys(): |
| for repeat_no in models[rank].keys(): | ||
| for i, cp_i in enumerate(models[rank][repeat_no]): | ||
| for j, cp_j in enumerate(models[rank][repeat_no]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for repeat_no in models[rank].keys(): | |
| for i, cp_i in enumerate(models[rank][repeat_no]): | |
| for j, cp_j in enumerate(models[rank][repeat_no]): | |
| for repeat_no, current_models in enumerate(models[rank]): | |
| for i, cp_i in enumerate(current_models): | |
| for j, cp_j in enumerate(current_models): |
A small change to account for using a list of lists instead of a dict of lists for models[rank]
| for repeat_no in models[rank].keys(): | ||
| for i, cp_i in enumerate(models[rank][repeat_no]): | ||
| for j, cp_j in enumerate(models[rank][repeat_no]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for repeat_no in models[rank].keys(): | |
| for i, cp_i in enumerate(models[rank][repeat_no]): | |
| for j, cp_j in enumerate(models[rank][repeat_no]): | |
| for repeat_no, current_models in enumerate(models[rank]): | |
| for i, cp_i in enumerate(current_models): | |
| for j, cp_j in enumerate(current_models): |
A small change to account for using a list of lists instead of a dict of lists for models[rank]
|
|
||
| :orphan: | ||
|
|
||
| .. _sphx_glr_sg_execution_times: | ||
|
|
||
|
|
||
| Computation times | ||
| ================= | ||
| **01:57.453** total execution time for 9 files **from all galleries**: | ||
|
|
||
| .. container:: | ||
|
|
||
| .. raw:: html | ||
|
|
||
| <style scoped> | ||
| <link href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/5.3.0/css/bootstrap.min.css" rel="stylesheet" /> | ||
| <link href="https://cdn.datatables.net/1.13.6/css/dataTables.bootstrap5.min.css" rel="stylesheet" /> | ||
| </style> | ||
| <script src="https://code.jquery.com/jquery-3.7.0.js"></script> | ||
| <script src="https://cdn.datatables.net/1.13.6/js/jquery.dataTables.min.js"></script> | ||
| <script src="https://cdn.datatables.net/1.13.6/js/dataTables.bootstrap5.min.js"></script> | ||
| <script type="text/javascript" class="init"> | ||
| $(document).ready( function () { | ||
| $('table.sg-datatable').DataTable({order: [[1, 'desc']]}); | ||
| } ); | ||
| </script> | ||
|
|
||
| .. list-table:: | ||
| :header-rows: 1 | ||
| :class: table table-striped sg-datatable | ||
|
|
||
| * - Example | ||
| - Time | ||
| - Mem (MB) | ||
| * - :ref:`sphx_glr_auto_examples_plot_replicability_analysis.py` (``../examples/plot_replicability_analysis.py``) | ||
| - 01:57.453 | ||
| - 0.0 | ||
| * - :ref:`sphx_glr_auto_examples_plot_bike_plotly.py` (``../examples/plot_bike_plotly.py``) | ||
| - 00:00.000 | ||
| - 0.0 | ||
| * - :ref:`sphx_glr_auto_examples_plot_core_consistency.py` (``../examples/plot_core_consistency.py``) | ||
| - 00:00.000 | ||
| - 0.0 | ||
| * - :ref:`sphx_glr_auto_examples_plot_labelled_decompositions.py` (``../examples/plot_labelled_decompositions.py``) | ||
| - 00:00.000 | ||
| - 0.0 | ||
| * - :ref:`sphx_glr_auto_examples_plot_optimisation_diagnostic.py` (``../examples/plot_optimisation_diagnostic.py``) | ||
| - 00:00.000 | ||
| - 0.0 | ||
| * - :ref:`sphx_glr_auto_examples_plot_outlier_detection.py` (``../examples/plot_outlier_detection.py``) | ||
| - 00:00.000 | ||
| - 0.0 | ||
| * - :ref:`sphx_glr_auto_examples_plot_selecting_aminoacids_components.py` (``../examples/plot_selecting_aminoacids_components.py``) | ||
| - 00:00.000 | ||
| - 0.0 | ||
| * - :ref:`sphx_glr_auto_examples_plot_split_half_analysis.py` (``../examples/plot_split_half_analysis.py``) | ||
| - 00:00.000 | ||
| - 0.0 | ||
| * - :ref:`sphx_glr_auto_examples_plot_working_with_xarray.py` (``../examples/plot_working_with_xarray.py``) | ||
| - 00:00.000 | ||
| - 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this file wasn't ignored in .gitignore already. I don't think it should be commited here. It should be deleted and added to .gitignore
|
I updated the CI/CD-pipeline since the Python version (and dependencies) we ran with before were old. Merging |
Hi,
as mentioned in #23, here is an example notebook with the replicability check approach for selecting the number of components.
Kindly let me know what you think.