Skip to content

Conversation

@cchatzis
Copy link

Hi,

as mentioned in #23, here is an example notebook with the replicability check approach for selecting the number of components.

Kindly let me know what you think.

@cchatzis cchatzis changed the title Added replicability example notebook Added replicability check Nov 21, 2025
Copy link

@blzserdos blzserdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see some comments below.

@cchatzis
Copy link
Author

Hi, I updated the example following the revision. Let me know what you think!

@cchatzis cchatzis requested a review from blzserdos November 28, 2025 14:35
Copy link

@blzserdos blzserdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

Copy link
Collaborator

@MarieRoald MarieRoald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, just a couple of small details to consider :)

Comment on lines +92 to +114
models[rank] = {}
split_indices[rank] = {}

for split_no, (train_index, _) in enumerate(rskf.split(dataset)):
repeat_no = split_no // splits

# Sort rows for consistent ordering (not necessary)

sorted_train_index = sorted(train_index)
train = dataset[sorted_train_index]

train = train / tl.norm(train) # Pre-process the tensor without leaking info from other folds

current_models = fit_many_parafac(train.data, rank)
current_model = tlviz.multimodel_evaluation.get_model_with_lowest_error(current_models, train)

if repeat_no not in models[rank].keys():
models[rank][repeat_no] = []

models[rank][repeat_no].append(current_model)

if repeat_no not in split_indices[rank].keys():
split_indices[rank][repeat_no] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
models[rank] = {}
split_indices[rank] = {}
for split_no, (train_index, _) in enumerate(rskf.split(dataset)):
repeat_no = split_no // splits
# Sort rows for consistent ordering (not necessary)
sorted_train_index = sorted(train_index)
train = dataset[sorted_train_index]
train = train / tl.norm(train) # Pre-process the tensor without leaking info from other folds
current_models = fit_many_parafac(train.data, rank)
current_model = tlviz.multimodel_evaluation.get_model_with_lowest_error(current_models, train)
if repeat_no not in models[rank].keys():
models[rank][repeat_no] = []
models[rank][repeat_no].append(current_model)
if repeat_no not in split_indices[rank].keys():
split_indices[rank][repeat_no] = []
models[rank] = [[] for _ in range(repeats)]
split_indices[rank] = [[] for _ in range(repeats)]
for split_no, (train_index, _) in enumerate(rskf.split(dataset)):
repeat_no = split_no // splits
# Sort rows for consistent ordering (not necessary)
sorted_train_index = sorted(train_index)
train = dataset[sorted_train_index]
train = train / tl.norm(train) # Pre-process the tensor without leaking info from other folds
current_models = fit_many_parafac(train.data, rank)
current_model = tlviz.multimodel_evaluation.get_model_with_lowest_error(current_models, train)
models[rank][repeat_no].append(current_model)

Since the keys here are contiguous integers starting at zero, I think it makes more sense to keep this as a list. By pre-allocating it you can also avoid the initialisation logic. Unless there's a specific reason to keep it as a dictionary of course.

# Here, we are skipping the mode we split (``mode=0``).

replicability_stability = {}
for rank in [2, 3, 4, 5]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for rank in [2, 3, 4, 5]:
for rank in models.keys():

Since you have this information defined in our models dictionary, you don't need to repeat them here.

for repeat_no in models[rank].keys():
for i, cp_i in enumerate(models[rank][repeat_no]):
for j, cp_j in enumerate(models[rank][repeat_no]):
if i < j: # include every pair only once and omit i == j
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if i < j: # include every pair only once and omit i == j
if i >= j: # include every pair only once and omit i == j
continue

If you flip the logic and use a continue statement, then you can un-indent the rest of the code a bit (which I at least find a bit easier to follow).

for repeat_no in models[rank].keys():
for i, cp_i in enumerate(models[rank][repeat_no]):
for j, cp_j in enumerate(models[rank][repeat_no]):
if i < j: # include every pair only once and omit i == j
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if i < j: # include every pair only once and omit i == j
if i >= j: # include every pair only once and omit i == j
continue

Same as above

# present in both subsets.

replicability_stability_alt = {}
for rank in [2, 3, 4, 5]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for rank in [2, 3, 4, 5]:
for rank in models.keys():

Comment on lines +171 to +173
for repeat_no in models[rank].keys():
for i, cp_i in enumerate(models[rank][repeat_no]):
for j, cp_j in enumerate(models[rank][repeat_no]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for repeat_no in models[rank].keys():
for i, cp_i in enumerate(models[rank][repeat_no]):
for j, cp_j in enumerate(models[rank][repeat_no]):
for repeat_no, current_models in enumerate(models[rank]):
for i, cp_i in enumerate(current_models):
for j, cp_j in enumerate(current_models):

A small change to account for using a list of lists instead of a dict of lists for models[rank]

Comment on lines +139 to +141
for repeat_no in models[rank].keys():
for i, cp_i in enumerate(models[rank][repeat_no]):
for j, cp_j in enumerate(models[rank][repeat_no]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for repeat_no in models[rank].keys():
for i, cp_i in enumerate(models[rank][repeat_no]):
for j, cp_j in enumerate(models[rank][repeat_no]):
for repeat_no, current_models in enumerate(models[rank]):
for i, cp_i in enumerate(current_models):
for j, cp_j in enumerate(current_models):

A small change to account for using a list of lists instead of a dict of lists for models[rank]

Comment on lines +1 to +61

:orphan:

.. _sphx_glr_sg_execution_times:


Computation times
=================
**01:57.453** total execution time for 9 files **from all galleries**:

.. container::

.. raw:: html

<style scoped>
<link href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/5.3.0/css/bootstrap.min.css" rel="stylesheet" />
<link href="https://cdn.datatables.net/1.13.6/css/dataTables.bootstrap5.min.css" rel="stylesheet" />
</style>
<script src="https://code.jquery.com/jquery-3.7.0.js"></script>
<script src="https://cdn.datatables.net/1.13.6/js/jquery.dataTables.min.js"></script>
<script src="https://cdn.datatables.net/1.13.6/js/dataTables.bootstrap5.min.js"></script>
<script type="text/javascript" class="init">
$(document).ready( function () {
$('table.sg-datatable').DataTable({order: [[1, 'desc']]});
} );
</script>

.. list-table::
:header-rows: 1
:class: table table-striped sg-datatable

* - Example
- Time
- Mem (MB)
* - :ref:`sphx_glr_auto_examples_plot_replicability_analysis.py` (``../examples/plot_replicability_analysis.py``)
- 01:57.453
- 0.0
* - :ref:`sphx_glr_auto_examples_plot_bike_plotly.py` (``../examples/plot_bike_plotly.py``)
- 00:00.000
- 0.0
* - :ref:`sphx_glr_auto_examples_plot_core_consistency.py` (``../examples/plot_core_consistency.py``)
- 00:00.000
- 0.0
* - :ref:`sphx_glr_auto_examples_plot_labelled_decompositions.py` (``../examples/plot_labelled_decompositions.py``)
- 00:00.000
- 0.0
* - :ref:`sphx_glr_auto_examples_plot_optimisation_diagnostic.py` (``../examples/plot_optimisation_diagnostic.py``)
- 00:00.000
- 0.0
* - :ref:`sphx_glr_auto_examples_plot_outlier_detection.py` (``../examples/plot_outlier_detection.py``)
- 00:00.000
- 0.0
* - :ref:`sphx_glr_auto_examples_plot_selecting_aminoacids_components.py` (``../examples/plot_selecting_aminoacids_components.py``)
- 00:00.000
- 0.0
* - :ref:`sphx_glr_auto_examples_plot_split_half_analysis.py` (``../examples/plot_split_half_analysis.py``)
- 00:00.000
- 0.0
* - :ref:`sphx_glr_auto_examples_plot_working_with_xarray.py` (``../examples/plot_working_with_xarray.py``)
- 00:00.000
- 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this file wasn't ignored in .gitignore already. I don't think it should be commited here. It should be deleted and added to .gitignore

@MarieRoald
Copy link
Collaborator

I updated the CI/CD-pipeline since the Python version (and dependencies) we ran with before were old. Merging main into this branch should hopefully make the pipeline succeed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants