Skip to content

Updated #33

Merged
afloresep merged 7 commits into
afloresep:masterfrom
daenuprobst:master
Jun 11, 2026
Merged

Updated #33
afloresep merged 7 commits into
afloresep:masterfrom
daenuprobst:master

Conversation

@daenuprobst

Copy link
Copy Markdown
Contributor

Two main changes to the layout pipeline:

  1. Per-level adaptive layout + crossing-reduction post-pass (C++)
  • Added untangle.hpp post-pass that greedily reduces edge crossings by rotating/reflecting subtrees about their parents, using a spatial grid for fast crossing detection. Configurable via UntangleMode (Rotate/Reflect), pass count, rotation steps, max angle, and bounded stem-slide.
  • Added adaptive per-level layout knobs in LayoutConfig (adaptive, ns_cap, ns_coef, ns_exp, quad_rotate) plus the untangle_* parameters, exposed through the Python bindings and tmap.layout (UntangleMode).
  1. Disconnected-graph bridging (Python)
  • New connect_knn_components in graph/connect.py: when a too-low n_neighbors fragments the kNN graph into multiple components, it adds minimum-weight cross-component bridge edges so the result is a single connected tree (keeps path/distance well-defined and the layout connected).
  • TMAP gains a connect_components constructor flag (default True), n_components_/n_bridges_ properties, and a UserWarning when bridges are added. You might want to check performance.

I also included some new examples/tests (untangle_demo.py, adaptive_coef_exp_3panel.py, molecules_tmap_legacy.py, test_connect.py) and a 200k-row ChEMBL example dataset.

daenuprobst and others added 4 commits June 9, 2026 22:42
The disconnected-graph check ran a pure-Python union-find over all n*k
edges on every fit, even when the graph was already connected. Replace it
with scipy.sparse.csgraph.connected_components (already a dependency), seed
the union-find from the resulting labels, and vectorize _labels.

Component detection on a connected 200k x 20 graph: ~2.9s -> ~0.16s (~19x).
Outputs unchanged (same components, bridges, connectivity); tests/test_connect.py
unit tests pass.
_make_layout_config stamped deterministic=True/seed onto the layout config
only when none was passed; a user-supplied layout_config was returned
verbatim with its default deterministic=False, so the layout ran
multi-threaded and nondeterministically even under reproducible=True.

Stamp deterministic/seed in both paths so the estimator owns layout
reproducibility regardless of whether a config is passed (the layout's own
multi-threading gives no measurable speedup, so this is free). Document that
these fields are set in place on the passed object, and add a regression
test asserting a reproducible=True fit with a passed LayoutConfig() is
bit-identical across runs.

@afloresep afloresep left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good. I'll change some of the pure python code in connect.py with spicy code which uses C which is much faster while achieving the same result

per my testing:
(scipy) : 1 components in 11.9 ms
(python union) : 1 components in 467.8 ms
speedup : 39x counts agree

@afloresep

Copy link
Copy Markdown
Owner

Im also adding some changes on the reprodudible, seed and OGDF seed knobs because we were silently breaking the reproducibility. Right now is a bit of a mess because OGDF can be deterministic, the MinHash+LSH Forest as well but USearch HNSW only if is single threaded and thus slow. So there's three knobs for the same thing. Probably will collapse the whole thing into just 'seed' and add a warning saying that reproducibility with USearch as backend comes with higher running times.

…t OGDF

test_estimator.py imported LayoutConfig at module top, but tmap.layout only
exports it when the OGDF extension is built. pytest-core runs without OGDF, so
collection failed with ImportError, failing the whole job. Import it inside the
OGDF-gated regression test instead (matching test_end_to_end.py /
test_layout_ogdf.py), so collection never references it when OGDF is absent.
Pre-existing one-line-docstring and formatting misses in the PR's new files,
flagged by the repo's `ruff format --check src/ tests/`.
The lint job runs `ruff format --check` then `ruff check`; the prior format
failure masked this pre-existing UP035 in connect.py. typing.Callable is
deprecated in favor of collections.abc.Callable.
@afloresep afloresep merged commit ce43664 into afloresep:master Jun 11, 2026
6 checks passed
@daenuprobst

Copy link
Copy Markdown
Contributor Author

Everything looks good. I'll change some of the pure python code in connect.py with spicy code which uses C which is much faster while achieving the same result

per my testing: (scipy) : 1 components in 11.9 ms (python union) : 1 components in 467.8 ms speedup : 39x counts agree

Oh crap, I forgot to port that one. I was experimenting in Python to avoid building it again and again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants