Added replicability check #24

cchatzis · 2025-11-21T12:00:36Z

Hi,

as mentioned in #23, here is an example notebook with the replicability check approach for selecting the number of components.

Kindly let me know what you think.

blzserdos

Please see some comments below.

docs/refs.bib

examples/plot_replicability_analysis.py

cchatzis · 2025-11-28T14:34:35Z

Hi, I updated the example following the revision. Let me know what you think!

blzserdos

Looks good! Thanks!

MarieRoald

This looks really good, just a couple of small details to consider :)

MarieRoald · 2025-12-21T08:40:32Z

examples/plot_replicability_analysis.py

+    models[rank] = {}
+    split_indices[rank] = {}
+
+    for split_no, (train_index, _) in enumerate(rskf.split(dataset)):
+        repeat_no = split_no // splits
+
+        # Sort rows for consistent ordering (not necessary)
+
+        sorted_train_index = sorted(train_index)
+        train = dataset[sorted_train_index]
+
+        train = train / tl.norm(train)  # Pre-process the tensor without leaking info from other folds
+
+        current_models = fit_many_parafac(train.data, rank)
+        current_model = tlviz.multimodel_evaluation.get_model_with_lowest_error(current_models, train)
+
+        if repeat_no not in models[rank].keys():
+            models[rank][repeat_no] = []
+
+        models[rank][repeat_no].append(current_model)
+
+        if repeat_no not in split_indices[rank].keys():
+            split_indices[rank][repeat_no] = []


Suggested change

models[rank] = {}

split_indices[rank] = {}

for split_no, (train_index, _) in enumerate(rskf.split(dataset)):

repeat_no = split_no // splits

# Sort rows for consistent ordering (not necessary)

sorted_train_index = sorted(train_index)

train = dataset[sorted_train_index]

train = train / tl.norm(train) # Pre-process the tensor without leaking info from other folds

current_models = fit_many_parafac(train.data, rank)

current_model = tlviz.multimodel_evaluation.get_model_with_lowest_error(current_models, train)

if repeat_no not in models[rank].keys():

models[rank][repeat_no] = []

models[rank][repeat_no].append(current_model)

if repeat_no not in split_indices[rank].keys():

split_indices[rank][repeat_no] = []

models[rank] = [[] for _ in range(repeats)]

split_indices[rank] = [[] for _ in range(repeats)]

for split_no, (train_index, _) in enumerate(rskf.split(dataset)):

repeat_no = split_no // splits

# Sort rows for consistent ordering (not necessary)

sorted_train_index = sorted(train_index)

train = dataset[sorted_train_index]

train = train / tl.norm(train) # Pre-process the tensor without leaking info from other folds

current_models = fit_many_parafac(train.data, rank)

current_model = tlviz.multimodel_evaluation.get_model_with_lowest_error(current_models, train)

models[rank][repeat_no].append(current_model)

Since the keys here are contiguous integers starting at zero, I think it makes more sense to keep this as a list. By pre-allocating it you can also avoid the initialisation logic. Unless there's a specific reason to keep it as a dictionary of course.

MarieRoald · 2025-12-21T08:45:33Z

examples/plot_replicability_analysis.py

+# Here, we are skipping the mode we split (``mode=0``).
+
+replicability_stability = {}
+for rank in [2, 3, 4, 5]:


Suggested change

for rank in [2, 3, 4, 5]:

for rank in models.keys():

Since you have this information defined in our models dictionary, you don't need to repeat them here.

MarieRoald · 2025-12-21T09:17:28Z

examples/plot_replicability_analysis.py

+    for repeat_no in models[rank].keys():
+        for i, cp_i in enumerate(models[rank][repeat_no]):
+            for j, cp_j in enumerate(models[rank][repeat_no]):
+                if i < j:  # include every pair only once and omit i == j


Suggested change

if i < j: # include every pair only once and omit i == j

if i >= j: # include every pair only once and omit i == j

continue

If you flip the logic and use a continue statement, then you can un-indent the rest of the code a bit (which I at least find a bit easier to follow).

MarieRoald · 2025-12-21T09:17:48Z

examples/plot_replicability_analysis.py

+    for repeat_no in models[rank].keys():
+        for i, cp_i in enumerate(models[rank][repeat_no]):
+            for j, cp_j in enumerate(models[rank][repeat_no]):
+                if i < j:  # include every pair only once and omit i == j


Suggested change

if i < j: # include every pair only once and omit i == j

if i >= j: # include every pair only once and omit i == j

continue

Same as above

MarieRoald · 2025-12-21T09:18:06Z

examples/plot_replicability_analysis.py

+# present in both subsets.
+
+replicability_stability_alt = {}
+for rank in [2, 3, 4, 5]:


Suggested change

for rank in [2, 3, 4, 5]:

for rank in models.keys():

MarieRoald · 2025-12-21T09:19:39Z

examples/plot_replicability_analysis.py

+    for repeat_no in models[rank].keys():
+        for i, cp_i in enumerate(models[rank][repeat_no]):
+            for j, cp_j in enumerate(models[rank][repeat_no]):


Suggested change

for repeat_no in models[rank].keys():

for i, cp_i in enumerate(models[rank][repeat_no]):

for j, cp_j in enumerate(models[rank][repeat_no]):

for repeat_no, current_models in enumerate(models[rank]):

for i, cp_i in enumerate(current_models):

for j, cp_j in enumerate(current_models):

A small change to account for using a list of lists instead of a dict of lists for models[rank]

MarieRoald · 2025-12-21T09:20:51Z

examples/plot_replicability_analysis.py

+    for repeat_no in models[rank].keys():
+        for i, cp_i in enumerate(models[rank][repeat_no]):
+            for j, cp_j in enumerate(models[rank][repeat_no]):


Suggested change

for repeat_no in models[rank].keys():

for i, cp_i in enumerate(models[rank][repeat_no]):

for j, cp_j in enumerate(models[rank][repeat_no]):

for repeat_no, current_models in enumerate(models[rank]):

for i, cp_i in enumerate(current_models):

for j, cp_j in enumerate(current_models):

A small change to account for using a list of lists instead of a dict of lists for models[rank]

MarieRoald · 2025-12-21T09:26:19Z

docs/sg_execution_times.rst

+
+:orphan:
+
+.. _sphx_glr_sg_execution_times:
+
+
+Computation times
+=================
+**01:57.453** total execution time for 9 files **from all galleries**:
+
+.. container::
+
+  .. raw:: html
+
+    <style scoped>
+    <link href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/5.3.0/css/bootstrap.min.css" rel="stylesheet" />
+    <link href="https://cdn.datatables.net/1.13.6/css/dataTables.bootstrap5.min.css" rel="stylesheet" />
+    </style>
+    <script src="https://code.jquery.com/jquery-3.7.0.js"></script>
+    <script src="https://cdn.datatables.net/1.13.6/js/jquery.dataTables.min.js"></script>
+    <script src="https://cdn.datatables.net/1.13.6/js/dataTables.bootstrap5.min.js"></script>
+    <script type="text/javascript" class="init">
+    $(document).ready( function () {
+        $('table.sg-datatable').DataTable({order: [[1, 'desc']]});
+    } );
+    </script>
+
+  .. list-table::
+   :header-rows: 1
+   :class: table table-striped sg-datatable
+
+   * - Example
+     - Time
+     - Mem (MB)
+   * - :ref:`sphx_glr_auto_examples_plot_replicability_analysis.py` (``../examples/plot_replicability_analysis.py``)
+     - 01:57.453
+     - 0.0
+   * - :ref:`sphx_glr_auto_examples_plot_bike_plotly.py` (``../examples/plot_bike_plotly.py``)
+     - 00:00.000
+     - 0.0
+   * - :ref:`sphx_glr_auto_examples_plot_core_consistency.py` (``../examples/plot_core_consistency.py``)
+     - 00:00.000
+     - 0.0
+   * - :ref:`sphx_glr_auto_examples_plot_labelled_decompositions.py` (``../examples/plot_labelled_decompositions.py``)
+     - 00:00.000
+     - 0.0
+   * - :ref:`sphx_glr_auto_examples_plot_optimisation_diagnostic.py` (``../examples/plot_optimisation_diagnostic.py``)
+     - 00:00.000
+     - 0.0
+   * - :ref:`sphx_glr_auto_examples_plot_outlier_detection.py` (``../examples/plot_outlier_detection.py``)
+     - 00:00.000
+     - 0.0
+   * - :ref:`sphx_glr_auto_examples_plot_selecting_aminoacids_components.py` (``../examples/plot_selecting_aminoacids_components.py``)
+     - 00:00.000
+     - 0.0
+   * - :ref:`sphx_glr_auto_examples_plot_split_half_analysis.py` (``../examples/plot_split_half_analysis.py``)
+     - 00:00.000
+     - 0.0
+   * - :ref:`sphx_glr_auto_examples_plot_working_with_xarray.py` (``../examples/plot_working_with_xarray.py``)
+     - 00:00.000
+     - 0.0


I'm not sure why this file wasn't ignored in .gitignore already. I don't think it should be commited here. It should be deleted and added to .gitignore

MarieRoald · 2025-12-22T08:32:01Z

I updated the CI/CD-pipeline since the Python version (and dependencies) we ran with before were old. Merging main into this branch should hopefully make the pipeline succeed :)

Added replicability example notebook

061e626

cchatzis changed the title ~~Added replicability example notebook~~ Added replicability check Nov 21, 2025

blzserdos suggested changes Nov 24, 2025

View reviewed changes

blzserdos reviewed Nov 25, 2025

View reviewed changes

examples/plot_replicability_analysis.py Outdated Show resolved Hide resolved

Addressed review comments

a8af868

cchatzis requested a review from blzserdos November 28, 2025 14:35

blzserdos approved these changes Nov 28, 2025

View reviewed changes

Added new ref

a94cb43

cchatzis mentioned this pull request Dec 1, 2025

Replicability example notebook for PARAFAC2 tensorly/tensorly#613

Open

blzserdos mentioned this pull request Dec 3, 2025

Example notebook of component selection via replicability for PARAFAC2 MarieRoald/matcouply#8

Open

MarieRoald requested changes Dec 21, 2025

View reviewed changes

	if i < j: # include every pair only once and omit i == j
	if i >= j: # include every pair only once and omit i == j
	continue

Added replicability check #24

Are you sure you want to change the base?

Added replicability check #24

Uh oh!

Conversation

cchatzis commented Nov 21, 2025

Uh oh!

blzserdos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cchatzis commented Nov 28, 2025

Uh oh!

blzserdos left a comment

Choose a reason for hiding this comment

Uh oh!

MarieRoald left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarieRoald commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants