Skip to content

BE-359: Refactor entity embedding generation to use entity IDs instead of full entities#8345

Open
TimDiekmann wants to merge 2 commits intot/be-310-dont-allow-filtering-users-by-emailfrom
t/be-359-handle-large-entities-in-temporal-update-embeddings-workflow
Open

BE-359: Refactor entity embedding generation to use entity IDs instead of full entities#8345
TimDiekmann wants to merge 2 commits intot/be-310-dont-allow-filtering-users-by-emailfrom
t/be-359-handle-large-entities-in-temporal-update-embeddings-workflow

Conversation

@TimDiekmann
Copy link
Member

🌟 What is the purpose of this PR?

This PR improves entity embedding security by moving property filtering from the backend to the worker. Instead of sending full entities to the worker, we now only send entity IDs and let the worker fetch and filter the entities, ensuring sensitive data is properly excluded from embeddings.

🔍 What does this change?

  • Refactors the embedding workflow to accept entity IDs instead of full entity objects
  • Moves property filtering logic from Rust to TypeScript in the worker
  • Adds support for configurable embedding exclusions based on entity type
  • Improves filtering by applying exclusions before embedding generation
  • Increases chunk size for entity ID batches (from 100 entities to 10,000 IDs)

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

  • modifies a Cargo-publishable library, but it is not yet ready to publish

📜 Does this require a change to the docs?

The changes in this PR:

  • are internal and do not require a docs change

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

  • do not affect the execution graph

🛡 What tests cover this?

  • Existing tests for embedding generation

❓ How to test this?

  1. Checkout the branch
  2. Create entities with properties that should be excluded from embeddings
  3. Verify that the excluded properties are not included in the generated embeddings

Avoids Temporal payload size limits by sending lightweight entity IDs
(~100 bytes each) rather than full entity objects (potentially MBs).
The TypeScript workflow fetches entities and applies property exclusions
dynamically based on the config passed from Rust.

- Add `embeddingExclusions` parameter to workflow for dynamic filtering
- Remove Rust-side `filter_entities_for_embedding` function
- Chunk entity IDs at 10,000 per workflow invocation
@cursor
Copy link

cursor bot commented Feb 2, 2026

PR Summary

Medium Risk
Touches the embeddings pipeline across Rust store, Temporal client, and worker; mistakes could lead to missing embeddings or accidental inclusion/exclusion of properties despite being aimed at improving data minimization.

Overview
Refactors entity embedding updates to pass only EntityIds to Temporal and have the TS worker fetch entities via queryEntities, rather than sending full entity payloads from the Rust store.

Adds configurable embeddingExclusions (by entity-type base URL → property base URLs) and applies the property stripping in the worker right before embedding generation, while keeping FlowRun entities excluded via both query filter and a runtime guard. Temporal chunking is adjusted to handle large batches efficiently (chunk size increased to 10,000 IDs).

Written by Cursor Bugbot for commit 7295282. This will update automatically on new commits. Configure here.

@github-actions github-actions bot added area/apps > hash* Affects HASH (a `hash-*` app) area/libs Relates to first-party libraries/crates/packages (area) type/eng > backend Owned by the @backend team area/apps labels Feb 2, 2026
Copy link
Member Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@TimDiekmann TimDiekmann mentioned this pull request Feb 2, 2026
3 tasks
@vercel
Copy link

vercel bot commented Feb 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hash Ready Ready Preview, Comment Feb 4, 2026 1:44pm
3 Skipped Deployments
Project Deployment Actions Updated (UTC)
hashdotdesign Ignored Ignored Preview Feb 4, 2026 1:44pm
hashdotdesign-tokens Ignored Ignored Preview Feb 4, 2026 1:44pm
petrinaut Skipped Skipped Feb 4, 2026 1:44pm

@codecov
Copy link

codecov bot commented Feb 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.73%. Comparing base (9806505) to head (7295282).

Additional details and impacted files
@@                               Coverage Diff                                @@
##           t/be-310-dont-allow-filtering-users-by-email    #8345      +/-   ##
================================================================================
+ Coverage                                         49.71%   49.73%   +0.02%     
================================================================================
  Files                                               493      493              
  Lines                                             56672    56644      -28     
  Branches                                           1507     1507              
================================================================================
  Hits                                              28173    28173              
+ Misses                                            28211    28183      -28     
  Partials                                            288      288              
Flag Coverage Δ
apps.hash-api 0.00% <ø> (ø)
rust.hash-graph-validation 83.45% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@augmentcode
Copy link

augmentcode bot commented Feb 2, 2026

🤖 Augment PR Summary

Summary: Refactors entity-embedding updates to pass only entity IDs to the Temporal worker so the worker fetches and filters entities before generating embeddings.

Changes:

  • Worker workflow `updateEntityEmbeddings` now accepts `entityIds`, constructs a Graph query filter from IDs, and queries entities in the worker.
  • Adds `embeddingExclusions` (entity-type base URL → property base URLs) and applies exclusions in the worker before calling embedding generation.
  • Postgres store now enqueues embedding updates using entity IDs plus the embedding-exclusion config rather than sending full entities.
  • Temporal client workflow API updated to accept IDs/exclusions and increases batching from 100 entities to 10,000 IDs per workflow invocation.

Technical Notes: Moves sensitive-property filtering into the worker to reduce exposure risk and ensure exclusions are applied immediately before embedding input creation.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Return early when no entity IDs are provided to avoid building
an ambiguous filter with empty `any: []` clause.
@vercel vercel bot temporarily deployed to Preview – petrinaut February 2, 2026 15:12 Inactive
@graphite-app graphite-app bot requested review from a team February 2, 2026 16:28
@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

Benchmark results

@rust/hash-graph-benches – Integrations

policy_resolution_large

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 2002 $$27.0 \mathrm{ms} \pm 184 \mathrm{μs}\left({\color{lightgreen}-5.183 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$3.25 \mathrm{ms} \pm 15.9 \mathrm{μs}\left({\color{gray}1.53 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 1001 $$12.5 \mathrm{ms} \pm 92.1 \mathrm{μs}\left({\color{red}5.09 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: high, policies: 3314 $$42.5 \mathrm{ms} \pm 354 \mathrm{μs}\left({\color{gray}-0.542 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: low, policies: 1 $$14.4 \mathrm{ms} \pm 87.0 \mathrm{μs}\left({\color{gray}-2.530 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: medium, policies: 1526 $$23.3 \mathrm{ms} \pm 138 \mathrm{μs}\left({\color{lightgreen}-8.375 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 2078 $$43.1 \mathrm{ms} \pm 191 \mathrm{μs}\left({\color{gray}0.938 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$20.9 \mathrm{ms} \pm 119 \mathrm{μs}\left({\color{red}5.22 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 1033 $$28.9 \mathrm{ms} \pm 154 \mathrm{μs}\left({\color{gray}4.18 \mathrm{\%}}\right) $$ Flame Graph

policy_resolution_medium

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 102 $$3.65 \mathrm{ms} \pm 24.4 \mathrm{μs}\left({\color{gray}-0.800 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$2.80 \mathrm{ms} \pm 13.9 \mathrm{μs}\left({\color{gray}-1.893 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 51 $$3.16 \mathrm{ms} \pm 17.2 \mathrm{μs}\left({\color{gray}-0.773 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: high, policies: 269 $$4.95 \mathrm{ms} \pm 20.8 \mathrm{μs}\left({\color{gray}-2.571 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: low, policies: 1 $$3.44 \mathrm{ms} \pm 22.5 \mathrm{μs}\left({\color{gray}-0.454 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: medium, policies: 107 $$3.97 \mathrm{ms} \pm 22.9 \mathrm{μs}\left({\color{gray}-1.380 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 133 $$4.26 \mathrm{ms} \pm 28.7 \mathrm{μs}\left({\color{gray}2.87 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$3.23 \mathrm{ms} \pm 16.0 \mathrm{μs}\left({\color{gray}-0.811 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 63 $$3.89 \mathrm{ms} \pm 26.0 \mathrm{μs}\left({\color{gray}0.933 \mathrm{\%}}\right) $$ Flame Graph

policy_resolution_none

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 2 $$2.35 \mathrm{ms} \pm 9.36 \mathrm{μs}\left({\color{gray}-1.728 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$2.31 \mathrm{ms} \pm 11.0 \mathrm{μs}\left({\color{gray}-0.206 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 1 $$2.40 \mathrm{ms} \pm 9.06 \mathrm{μs}\left({\color{gray}0.155 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 8 $$2.60 \mathrm{ms} \pm 9.89 \mathrm{μs}\left({\color{gray}-0.617 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$2.47 \mathrm{ms} \pm 8.65 \mathrm{μs}\left({\color{gray}-0.199 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 3 $$2.67 \mathrm{ms} \pm 10.4 \mathrm{μs}\left({\color{gray}-0.120 \mathrm{\%}}\right) $$ Flame Graph

policy_resolution_small

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 52 $$2.76 \mathrm{ms} \pm 12.5 \mathrm{μs}\left({\color{gray}-0.108 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$2.48 \mathrm{ms} \pm 13.0 \mathrm{μs}\left({\color{gray}1.30 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 25 $$2.60 \mathrm{ms} \pm 13.9 \mathrm{μs}\left({\color{gray}-0.052 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: high, policies: 94 $$3.13 \mathrm{ms} \pm 18.3 \mathrm{μs}\left({\color{gray}1.12 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: low, policies: 1 $$2.68 \mathrm{ms} \pm 10.6 \mathrm{μs}\left({\color{gray}-0.206 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: medium, policies: 26 $$2.88 \mathrm{ms} \pm 13.4 \mathrm{μs}\left({\color{gray}0.949 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 66 $$3.07 \mathrm{ms} \pm 17.2 \mathrm{μs}\left({\color{gray}0.426 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$2.65 \mathrm{ms} \pm 17.1 \mathrm{μs}\left({\color{gray}-0.245 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 29 $$2.88 \mathrm{ms} \pm 13.7 \mathrm{μs}\left({\color{gray}-1.438 \mathrm{\%}}\right) $$ Flame Graph

read_scaling_complete

Function Value Mean Flame graphs
entity_by_id;one_depth 1 entities $$39.5 \mathrm{ms} \pm 156 \mathrm{μs}\left({\color{gray}0.450 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 10 entities $$76.8 \mathrm{ms} \pm 425 \mathrm{μs}\left({\color{gray}1.02 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 25 entities $$43.1 \mathrm{ms} \pm 146 \mathrm{μs}\left({\color{gray}-0.215 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 5 entities $$46.0 \mathrm{ms} \pm 260 \mathrm{μs}\left({\color{gray}0.376 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 50 entities $$54.2 \mathrm{ms} \pm 342 \mathrm{μs}\left({\color{gray}-0.441 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 1 entities $$41.3 \mathrm{ms} \pm 180 \mathrm{μs}\left({\color{gray}0.775 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 10 entities $$421 \mathrm{ms} \pm 917 \mathrm{μs}\left({\color{gray}0.958 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 25 entities $$93.9 \mathrm{ms} \pm 370 \mathrm{μs}\left({\color{gray}-1.615 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 5 entities $$84.3 \mathrm{ms} \pm 360 \mathrm{μs}\left({\color{gray}-1.051 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 50 entities $$280 \mathrm{ms} \pm 723 \mathrm{μs}\left({\color{lightgreen}-10.621 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 1 entities $$14.9 \mathrm{ms} \pm 73.1 \mathrm{μs}\left({\color{gray}-2.960 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 10 entities $$14.8 \mathrm{ms} \pm 69.4 \mathrm{μs}\left({\color{gray}-4.952 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 25 entities $$15.2 \mathrm{ms} \pm 64.5 \mathrm{μs}\left({\color{gray}-3.863 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 5 entities $$14.8 \mathrm{ms} \pm 61.5 \mathrm{μs}\left({\color{lightgreen}-5.272 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 50 entities $$17.8 \mathrm{ms} \pm 86.8 \mathrm{μs}\left({\color{gray}-4.060 \mathrm{\%}}\right) $$ Flame Graph

read_scaling_linkless

Function Value Mean Flame graphs
entity_by_id 1 entities $$14.7 \mathrm{ms} \pm 65.6 \mathrm{μs}\left({\color{lightgreen}-5.387 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10 entities $$14.6 \mathrm{ms} \pm 65.4 \mathrm{μs}\left({\color{gray}-4.442 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 100 entities $$14.6 \mathrm{ms} \pm 63.9 \mathrm{μs}\left({\color{gray}-3.782 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1000 entities $$15.2 \mathrm{ms} \pm 69.0 \mathrm{μs}\left({\color{lightgreen}-5.639 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10000 entities $$22.6 \mathrm{ms} \pm 151 \mathrm{μs}\left({\color{gray}-4.901 \mathrm{\%}}\right) $$ Flame Graph

representative_read_entity

Function Value Mean Flame graphs
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1 $$30.3 \mathrm{ms} \pm 258 \mathrm{μs}\left({\color{gray}-1.933 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1 $$30.1 \mathrm{ms} \pm 276 \mathrm{μs}\left({\color{lightgreen}-7.502 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1 $$29.6 \mathrm{ms} \pm 259 \mathrm{μs}\left({\color{lightgreen}-8.621 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1 $$31.3 \mathrm{ms} \pm 301 \mathrm{μs}\left({\color{gray}3.16 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2 $$30.5 \mathrm{ms} \pm 330 \mathrm{μs}\left({\color{gray}-2.127 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1 $$31.1 \mathrm{ms} \pm 304 \mathrm{μs}\left({\color{gray}-4.340 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1 $$30.7 \mathrm{ms} \pm 320 \mathrm{μs}\left({\color{gray}1.42 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1 $$30.1 \mathrm{ms} \pm 304 \mathrm{μs}\left({\color{lightgreen}-5.482 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1 $$30.8 \mathrm{ms} \pm 339 \mathrm{μs}\left({\color{gray}-2.469 \mathrm{\%}}\right) $$ Flame Graph

representative_read_entity_type

Function Value Mean Flame graphs
get_entity_type_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba $$8.15 \mathrm{ms} \pm 42.6 \mathrm{μs}\left({\color{gray}-0.085 \mathrm{\%}}\right) $$ Flame Graph

representative_read_multiple_entities

Function Value Mean Flame graphs
entity_by_property traversal_paths=0 0 $$91.5 \mathrm{ms} \pm 398 \mathrm{μs}\left({\color{gray}-0.205 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=255 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true $$141 \mathrm{ms} \pm 478 \mathrm{μs}\left({\color{gray}-0.496 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false $$102 \mathrm{ms} \pm 419 \mathrm{μs}\left({\color{gray}2.28 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true $$106 \mathrm{ms} \pm 481 \mathrm{μs}\left({\color{gray}-1.542 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true $$117 \mathrm{ms} \pm 605 \mathrm{μs}\left({\color{gray}0.665 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true $$123 \mathrm{ms} \pm 451 \mathrm{μs}\left({\color{gray}-0.046 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=0 0 $$90.7 \mathrm{ms} \pm 412 \mathrm{μs}\left({\color{gray}-0.112 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=255 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true $$116 \mathrm{ms} \pm 542 \mathrm{μs}\left({\color{gray}0.690 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false $$96.9 \mathrm{ms} \pm 420 \mathrm{μs}\left({\color{gray}0.161 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true $$102 \mathrm{ms} \pm 385 \mathrm{μs}\left({\color{gray}-2.969 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true $$104 \mathrm{ms} \pm 502 \mathrm{μs}\left({\color{gray}-1.415 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true $$103 \mathrm{ms} \pm 362 \mathrm{μs}\left({\color{gray}-2.274 \mathrm{\%}}\right) $$

scenarios

Function Value Mean Flame graphs
full_test query-limited $$135 \mathrm{ms} \pm 532 \mathrm{μs}\left({\color{gray}2.60 \mathrm{\%}}\right) $$ Flame Graph
full_test query-unlimited $$134 \mathrm{ms} \pm 554 \mathrm{μs}\left({\color{gray}0.373 \mathrm{\%}}\right) $$ Flame Graph
linked_queries query-limited $$105 \mathrm{ms} \pm 676 \mathrm{μs}\left({\color{gray}-0.897 \mathrm{\%}}\right) $$ Flame Graph
linked_queries query-unlimited $$609 \mathrm{ms} \pm 3.48 \mathrm{ms}\left({\color{gray}1.78 \mathrm{\%}}\right) $$ Flame Graph

@TimDiekmann TimDiekmann force-pushed the t/be-359-handle-large-entities-in-temporal-update-embeddings-workflow branch from 02733f8 to 7295282 Compare February 4, 2026 13:37
@vercel vercel bot temporarily deployed to Preview – petrinaut February 4, 2026 13:37 Inactive
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

const generatedEmbeddings =
await aiActivities.createEntityEmbeddingsActivity({
entityProperties: entity.properties,
entityProperties: filteredProperties,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing embedding exclusions in bulk update function

Medium Severity

The updateAllEntityEmbeddings function calls updateEntityEmbeddings without passing embeddingExclusions, meaning protected properties (like email addresses on User entities) won't be excluded when this function is used for bulk reindexing. While the Rust code properly passes exclusions through start_update_entity_embeddings_workflow, this TypeScript-only path bypasses that protection.

Fix in Cursor Fix in Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to move this endpoint to the graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/apps > hash* Affects HASH (a `hash-*` app) area/apps area/libs Relates to first-party libraries/crates/packages (area) type/eng > backend Owned by the @backend team

Development

Successfully merging this pull request may close these issues.

2 participants