Skip to content

BE-301: HashQL: Implement size estimation analysis for MIR#8278

Open
indietyp wants to merge 32 commits intomainfrom
bm/be-301-hashql-size-estimation-for-local-variables-and-functions
Open

BE-301: HashQL: Implement size estimation analysis for MIR#8278
indietyp wants to merge 32 commits intomainfrom
bm/be-301-hashql-size-estimation-for-local-variables-and-functions

Conversation

@indietyp
Copy link
Member

🌟 What is the purpose of this PR?

This PR adds a size estimation analysis to the MIR framework that can statically determine the size of values flowing through a program. The analysis helps predict memory usage and performance characteristics of queries.

🔍 What does this change?

  • Implements a two-phase size estimation analysis:
    • Static analysis: Estimates sizes purely from type information
    • Dynamic analysis: Uses dataflow to track how sizes propagate through the program
  • Adds support for tracking parameter-dependent sizes using affine equations
  • Introduces range types for representing bounded and unbounded size estimates
  • Updates the MIR builder guide to document new type syntax for lists and unknown types

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

  • does not modify any publishable blocks or libraries, or modifications do not need publishing

📜 Does this require a change to the docs?

The changes in this PR:

  • are internal and do not require a docs change

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

  • do not affect the execution graph

🛡 What tests cover this?

  • Comprehensive unit tests for all components of the size estimation system
  • Integration tests for multi-function analysis and recursive functions
  • Snapshot tests to verify analysis results

@cursor
Copy link

cursor bot commented Jan 20, 2026

PR Summary

Medium Risk
Adds a new global MIR analysis and extends the generic dataflow framework/signatures (boundary initialization, metadata-based iteration controls, lattice trait generics), which may impact other analyses and fixpoint behavior despite being covered by extensive tests.

Overview
Adds a new MIR size estimation analysis that computes per-body Footprints (units + cardinality) via a two-phase approach: static sizing from types and a forward dataflow fallback for dynamic/unknown/intrinsic types, including parameter-dependent results modeled as affine equations and SCC-aware fixpoint iteration over the call graph.

Extends the dataflow framework to support optional per-analysis iteration metadata and edge/block gating (initialize_metadata, should_process_block, should_propagate_between) and updates trait bounds/signatures (notably passing an allocator into initialize_boundary).

Introduces supporting primitives across core/MIR (e.g., small_vec_from_elem, improved allocator-aware Vec cloning, IdVec::into_iter_enumerated, more flexible IdVec comparisons, JoinSemiLattice/HasBottom/HasTop generic tweaks), expands the body! macro/type syntax to include ? and [List T], and adds comprehensive unit + snapshot tests for the new analysis.

Written by Cursor Bugbot for commit ec4abcb. This will update automatically on new commits. Configure here.

@github-actions github-actions bot added area/infra Relates to version control, CI, CD or IaC (area) area/libs Relates to first-party libraries/crates/packages (area) type/eng > backend Owned by the @backend team labels Jan 20, 2026
TimDiekmann
TimDiekmann previously approved these changes Jan 23, 2026
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, a trait would be overkill. I know I put quite a lot of work into the bound comparisons for temporal versioning, but I guess this became a lot easier nowadays. I guess it makes sense to move out common things like this but I always struggle to find a good location.
Let's leave it as it is.

@indietyp indietyp force-pushed the bm/be-273-hashql-interpreter-benchmarks branch from d7b72f2 to ca9cc26 Compare January 25, 2026 11:15
@indietyp indietyp force-pushed the bm/be-301-hashql-size-estimation-for-local-variables-and-functions branch from 9594e78 to 2fe1a32 Compare January 25, 2026 11:15
@indietyp indietyp force-pushed the bm/be-273-hashql-interpreter-benchmarks branch from ca9cc26 to d756493 Compare January 29, 2026 20:48
@indietyp indietyp force-pushed the bm/be-301-hashql-size-estimation-for-local-variables-and-functions branch from 2fe1a32 to 6f5ce87 Compare January 29, 2026 20:48
@indietyp indietyp force-pushed the bm/be-301-hashql-size-estimation-for-local-variables-and-functions branch from 6f5ce87 to 784af93 Compare January 30, 2026 10:36
@github-actions github-actions bot dismissed TimDiekmann’s stale review January 30, 2026 10:36

Your organization requires reapproval when changes are made, so Graphite has dismissed approvals. See the output of git range-diff at https://github.com/hashintel/hash/actions/runs/21512892173

Base automatically changed from bm/be-273-hashql-interpreter-benchmarks to main January 30, 2026 11:05
@indietyp indietyp force-pushed the bm/be-301-hashql-size-estimation-for-local-variables-and-functions branch from 784af93 to f0ffcbc Compare January 30, 2026 23:24
@indietyp indietyp force-pushed the bm/be-301-hashql-size-estimation-for-local-variables-and-functions branch from f0ffcbc to ec4abcb Compare February 2, 2026 16:36
@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

Benchmark results

@rust/hash-graph-benches – Integrations

policy_resolution_large

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 2002 $$26.7 \mathrm{ms} \pm 152 \mathrm{μs}\left({\color{lightgreen}-24.482 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$3.22 \mathrm{ms} \pm 14.1 \mathrm{μs}\left({\color{gray}-1.045 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 1001 $$12.0 \mathrm{ms} \pm 67.1 \mathrm{μs}\left({\color{gray}1.11 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: high, policies: 3314 $$42.6 \mathrm{ms} \pm 317 \mathrm{μs}\left({\color{gray}0.877 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: low, policies: 1 $$13.6 \mathrm{ms} \pm 85.7 \mathrm{μs}\left({\color{gray}-1.317 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: medium, policies: 1526 $$23.5 \mathrm{ms} \pm 173 \mathrm{μs}\left({\color{gray}1.86 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 2078 $$42.6 \mathrm{ms} \pm 244 \mathrm{μs}\left({\color{lightgreen}-13.464 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$19.9 \mathrm{ms} \pm 115 \mathrm{μs}\left({\color{lightgreen}-5.603 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 1033 $$27.5 \mathrm{ms} \pm 164 \mathrm{μs}\left({\color{lightgreen}-16.046 \mathrm{\%}}\right) $$ Flame Graph

policy_resolution_medium

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 102 $$3.71 \mathrm{ms} \pm 21.4 \mathrm{μs}\left({\color{gray}2.86 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$2.79 \mathrm{ms} \pm 10.4 \mathrm{μs}\left({\color{gray}-0.634 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 51 $$3.21 \mathrm{ms} \pm 13.7 \mathrm{μs}\left({\color{gray}1.98 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: high, policies: 269 $$4.94 \mathrm{ms} \pm 25.1 \mathrm{μs}\left({\color{gray}-1.372 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: low, policies: 1 $$3.39 \mathrm{ms} \pm 20.0 \mathrm{μs}\left({\color{gray}1.30 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: medium, policies: 107 $$3.91 \mathrm{ms} \pm 20.2 \mathrm{μs}\left({\color{gray}-1.297 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 133 $$4.27 \mathrm{ms} \pm 19.1 \mathrm{μs}\left({\color{red}6.42 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$3.27 \mathrm{ms} \pm 13.7 \mathrm{μs}\left({\color{gray}1.74 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 63 $$3.90 \mathrm{ms} \pm 19.4 \mathrm{μs}\left({\color{gray}1.68 \mathrm{\%}}\right) $$ Flame Graph

policy_resolution_none

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 2 $$2.37 \mathrm{ms} \pm 9.45 \mathrm{μs}\left({\color{gray}0.272 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$2.34 \mathrm{ms} \pm 10.6 \mathrm{μs}\left({\color{gray}1.09 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 1 $$2.43 \mathrm{ms} \pm 12.6 \mathrm{μs}\left({\color{gray}0.661 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 8 $$2.63 \mathrm{ms} \pm 13.2 \mathrm{μs}\left({\color{gray}0.196 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$2.49 \mathrm{ms} \pm 9.06 \mathrm{μs}\left({\color{gray}-0.778 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 3 $$2.67 \mathrm{ms} \pm 9.20 \mathrm{μs}\left({\color{gray}-0.860 \mathrm{\%}}\right) $$ Flame Graph

policy_resolution_small

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 52 $$2.78 \mathrm{ms} \pm 14.7 \mathrm{μs}\left({\color{gray}-0.343 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$2.40 \mathrm{ms} \pm 10.5 \mathrm{μs}\left({\color{gray}0.394 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 25 $$2.58 \mathrm{ms} \pm 10.0 \mathrm{μs}\left({\color{gray}-1.127 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: high, policies: 94 $$3.08 \mathrm{ms} \pm 12.5 \mathrm{μs}\left({\color{gray}-0.062 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: low, policies: 1 $$2.65 \mathrm{ms} \pm 11.7 \mathrm{μs}\left({\color{gray}-0.065 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: medium, policies: 26 $$2.87 \mathrm{ms} \pm 12.5 \mathrm{μs}\left({\color{gray}-0.074 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 66 $$3.04 \mathrm{ms} \pm 15.4 \mathrm{μs}\left({\color{gray}-0.221 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$2.62 \mathrm{ms} \pm 12.5 \mathrm{μs}\left({\color{gray}-0.509 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 29 $$2.90 \mathrm{ms} \pm 18.2 \mathrm{μs}\left({\color{gray}0.989 \mathrm{\%}}\right) $$ Flame Graph

read_scaling_complete

Function Value Mean Flame graphs
entity_by_id;one_depth 1 entities $$39.2 \mathrm{ms} \pm 153 \mathrm{μs}\left({\color{gray}2.49 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 10 entities $$76.7 \mathrm{ms} \pm 355 \mathrm{μs}\left({\color{gray}-0.412 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 25 entities $$44.7 \mathrm{ms} \pm 171 \mathrm{μs}\left({\color{gray}0.947 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 5 entities $$46.8 \mathrm{ms} \pm 207 \mathrm{μs}\left({\color{gray}2.25 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 50 entities $$53.6 \mathrm{ms} \pm 243 \mathrm{μs}\left({\color{gray}-0.532 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 1 entities $$41.1 \mathrm{ms} \pm 158 \mathrm{μs}\left({\color{gray}-2.695 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 10 entities $$422 \mathrm{ms} \pm 817 \mathrm{μs}\left({\color{gray}1.36 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 25 entities $$93.2 \mathrm{ms} \pm 394 \mathrm{μs}\left({\color{gray}-3.687 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 5 entities $$84.8 \mathrm{ms} \pm 311 \mathrm{μs}\left({\color{gray}-0.380 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 50 entities $$278 \mathrm{ms} \pm 507 \mathrm{μs}\left({\color{lightgreen}-12.584 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 1 entities $$14.6 \mathrm{ms} \pm 61.0 \mathrm{μs}\left({\color{gray}-0.649 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 10 entities $$14.6 \mathrm{ms} \pm 64.1 \mathrm{μs}\left({\color{gray}0.174 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 25 entities $$14.9 \mathrm{ms} \pm 75.0 \mathrm{μs}\left({\color{gray}-1.038 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 5 entities $$14.5 \mathrm{ms} \pm 60.5 \mathrm{μs}\left({\color{gray}-2.020 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 50 entities $$17.4 \mathrm{ms} \pm 110 \mathrm{μs}\left({\color{gray}-4.823 \mathrm{\%}}\right) $$ Flame Graph

read_scaling_linkless

Function Value Mean Flame graphs
entity_by_id 1 entities $$14.7 \mathrm{ms} \pm 68.4 \mathrm{μs}\left({\color{gray}-1.075 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10 entities $$14.4 \mathrm{ms} \pm 67.7 \mathrm{μs}\left({\color{gray}-2.795 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 100 entities $$14.3 \mathrm{ms} \pm 64.4 \mathrm{μs}\left({\color{gray}-4.209 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1000 entities $$15.1 \mathrm{ms} \pm 84.2 \mathrm{μs}\left({\color{gray}-1.178 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10000 entities $$22.0 \mathrm{ms} \pm 151 \mathrm{μs}\left({\color{gray}-4.249 \mathrm{\%}}\right) $$ Flame Graph

representative_read_entity

Function Value Mean Flame graphs
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1 $$29.0 \mathrm{ms} \pm 285 \mathrm{μs}\left({\color{gray}-2.402 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1 $$29.1 \mathrm{ms} \pm 286 \mathrm{μs}\left({\color{gray}-3.593 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1 $$29.4 \mathrm{ms} \pm 247 \mathrm{μs}\left({\color{gray}0.495 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1 $$28.6 \mathrm{ms} \pm 261 \mathrm{μs}\left({\color{gray}-0.097 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2 $$29.5 \mathrm{ms} \pm 299 \mathrm{μs}\left({\color{gray}-2.501 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1 $$30.0 \mathrm{ms} \pm 242 \mathrm{μs}\left({\color{gray}-3.007 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1 $$30.1 \mathrm{ms} \pm 265 \mathrm{μs}\left({\color{gray}3.47 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1 $$29.9 \mathrm{ms} \pm 292 \mathrm{μs}\left({\color{gray}2.37 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1 $$29.2 \mathrm{ms} \pm 282 \mathrm{μs}\left({\color{gray}-2.300 \mathrm{\%}}\right) $$ Flame Graph

representative_read_entity_type

Function Value Mean Flame graphs
get_entity_type_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba $$7.95 \mathrm{ms} \pm 37.0 \mathrm{μs}\left({\color{gray}-2.520 \mathrm{\%}}\right) $$ Flame Graph

representative_read_multiple_entities

Function Value Mean Flame graphs
entity_by_property traversal_paths=0 0 $$44.9 \mathrm{ms} \pm 224 \mathrm{μs}\left({\color{lightgreen}-5.211 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=255 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true $$92.2 \mathrm{ms} \pm 353 \mathrm{μs}\left({\color{gray}-2.355 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false $$50.6 \mathrm{ms} \pm 332 \mathrm{μs}\left({\color{lightgreen}-5.808 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true $$58.8 \mathrm{ms} \pm 369 \mathrm{μs}\left({\color{lightgreen}-5.442 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true $$67.2 \mathrm{ms} \pm 501 \mathrm{μs}\left({\color{gray}-4.357 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true $$73.4 \mathrm{ms} \pm 426 \mathrm{μs}\left({\color{gray}-3.997 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=0 0 $$48.9 \mathrm{ms} \pm 267 \mathrm{μs}\left({\color{gray}-2.295 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=255 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true $$75.0 \mathrm{ms} \pm 341 \mathrm{μs}\left({\color{gray}-2.937 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false $$56.3 \mathrm{ms} \pm 352 \mathrm{μs}\left({\color{gray}-3.040 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true $$63.4 \mathrm{ms} \pm 390 \mathrm{μs}\left({\color{gray}-3.593 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true $$64.8 \mathrm{ms} \pm 350 \mathrm{μs}\left({\color{gray}-3.858 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true $$65.0 \mathrm{ms} \pm 382 \mathrm{μs}\left({\color{gray}-3.131 \mathrm{\%}}\right) $$

scenarios

Function Value Mean Flame graphs
full_test query-limited $$128 \mathrm{ms} \pm 597 \mathrm{μs}\left({\color{gray}0.918 \mathrm{\%}}\right) $$ Flame Graph
full_test query-unlimited $$128 \mathrm{ms} \pm 552 \mathrm{μs}\left({\color{gray}1.34 \mathrm{\%}}\right) $$ Flame Graph
linked_queries query-limited $$104 \mathrm{ms} \pm 499 \mathrm{μs}\left({\color{red}149 \mathrm{\%}}\right) $$ Flame Graph
linked_queries query-unlimited $$600 \mathrm{ms} \pm 3.58 \mathrm{ms}\left({\color{red}7.44 \mathrm{\%}}\right) $$ Flame Graph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/infra Relates to version control, CI, CD or IaC (area) area/libs Relates to first-party libraries/crates/packages (area) area/tests New or updated tests type/eng > backend Owned by the @backend team

Development

Successfully merging this pull request may close these issues.

2 participants