Localize PersonalizedPageRank traversal#6
Conversation
📝 WalkthroughWalkthroughPersonalizedPageRank now accepts a Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Search Caller
participant SGM as searchGraphPPR
participant Graph as Graph.PersonalizedPageRank
participant BFS as localPPRNeighborhoodLocked
Caller->>SGM: searchGraphPPR(ctx, seeds, opts)
activate SGM
SGM->>Graph: PersonalizedPageRank(ctx, seeds, opts.MaxHops, ...)
activate Graph
Graph->>BFS: localPPRNeighborhoodLocked(ctx, seeds, maxHops)
activate BFS
rect rgba(100, 150, 200, 0.5)
Note over BFS: BFS on forward & reverse edges\ncollect nodeIDs and per-node restricted neighbors
BFS->>BFS: check ctx.Err() during traversal
end
BFS-->>Graph: (nodeIDs, neighbors, error)
deactivate BFS
alt ctx canceled or BFS error
Graph-->>SGM: (nil, error)
SGM-->>Caller: (nil, wrapped error)
else seeds/neighborhood empty
Graph-->>SGM: (nil, nil)
SGM-->>Caller: (nil, nil)
else proceed with local PPR
rect rgba(150, 200, 100, 0.5)
Note over Graph: Initialize ranks on local nodes\niterate up to maxIter\ncheck ctx.Err() between iterations
end
Graph-->>SGM: (ranks, nil)
SGM-->>Caller: (ranks, nil)
end
deactivate Graph
deactivate SGM
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/graph/graph.go`:
- Around line 412-420: The call to g.localPPRNeighborhoodLocked returns an
unwrapped error; update the error return to wrap it with context using
fmt.Errorf so callers get useful debug info (e.g., replace "return nil, err"
with "return nil, fmt.Errorf(\"localPPRNeighborhoodLocked failed: %w\", err)").
Ensure you import fmt if needed and keep the surrounding lock/unlock and
nil/empty-nodes behavior unchanged; references: g.localPPRNeighborhoodLocked and
the error-return site in the function that holds g.mu.RLock/g.mu.RUnlock.
- Around line 392-410: The raw ctx.Err() returns in PersonalizedPageRank and
localPPRNeighborhoodLocked should be wrapped with contextual error messages;
replace direct returns of ctx.Err() (e.g., in PersonalizedPageRank around the
initial check and the later checks previously at lines noted in the review, and
in localPPRNeighborhoodLocked) with fmt.Errorf("context: %w", err) so callers
get descriptive context; search for ctx.Err() usages inside the
PersonalizedPageRank method and localPPRNeighborhoodLocked helper and update
each return to wrap the error using fmt.Errorf("context: %w", err).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4f88a837-2bbf-463d-8c64-fbd6fd42d8c0
📒 Files selected for processing (3)
internal/graph/graph.gointernal/graph/graph_test.gointernal/search/graph.go
| // PersonalizedPageRank computes PPR scores seeded from the given entity IDs. | ||
| // alpha is the teleport probability (typically 0.15), maxIter limits iterations, | ||
| // and epsilon is the convergence threshold. | ||
| func (g *Graph) PersonalizedPageRank(seeds []string, alpha float64, maxIter int, epsilon float64) map[string]float64 { | ||
| g.mu.RLock() | ||
| defer g.mu.RUnlock() | ||
|
|
||
| // The random walk is restricted to the local neighborhood within maxHops of the | ||
| // seeds so per-query work scales with the relevant subgraph rather than the | ||
| // entire KB graph. alpha is the teleport probability (typically 0.15), maxIter | ||
| // limits iterations, and epsilon is the convergence threshold. | ||
| func (g *Graph) PersonalizedPageRank( | ||
| ctx context.Context, | ||
| seeds []string, | ||
| maxHops int, | ||
| alpha float64, | ||
| maxIter int, | ||
| epsilon float64, | ||
| ) (map[string]float64, error) { | ||
| if len(seeds) == 0 { | ||
| return nil | ||
| return nil, nil | ||
| } | ||
|
|
||
| // Build the personalization vector (uniform over seeds). | ||
| seedWeight := 1.0 / float64(len(seeds)) | ||
| personal := make(map[string]float64, len(seeds)) | ||
| for _, s := range seeds { | ||
| personal[s] = seedWeight | ||
| if err := ctx.Err(); err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
Wrap context errors with descriptive context.
Per coding guidelines, errors should be wrapped using fmt.Errorf("context: %w", err). The raw ctx.Err() returns at lines 409 and similar locations throughout this function lack context about where the cancellation occurred.
Proposed fix
if err := ctx.Err(); err != nil {
- return nil, err
+ return nil, fmt.Errorf("ppr: context check: %w", err)
}Apply similar wrapping to other ctx.Err() returns at lines 446-448, 456-458, 472-474, and in localPPRNeighborhoodLocked at lines 512-514.
As per coding guidelines: **/*.go: Always wrap errors with context using fmt.Errorf("context: %w", err). Never return naked errors.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // PersonalizedPageRank computes PPR scores seeded from the given entity IDs. | |
| // alpha is the teleport probability (typically 0.15), maxIter limits iterations, | |
| // and epsilon is the convergence threshold. | |
| func (g *Graph) PersonalizedPageRank(seeds []string, alpha float64, maxIter int, epsilon float64) map[string]float64 { | |
| g.mu.RLock() | |
| defer g.mu.RUnlock() | |
| // The random walk is restricted to the local neighborhood within maxHops of the | |
| // seeds so per-query work scales with the relevant subgraph rather than the | |
| // entire KB graph. alpha is the teleport probability (typically 0.15), maxIter | |
| // limits iterations, and epsilon is the convergence threshold. | |
| func (g *Graph) PersonalizedPageRank( | |
| ctx context.Context, | |
| seeds []string, | |
| maxHops int, | |
| alpha float64, | |
| maxIter int, | |
| epsilon float64, | |
| ) (map[string]float64, error) { | |
| if len(seeds) == 0 { | |
| return nil | |
| return nil, nil | |
| } | |
| // Build the personalization vector (uniform over seeds). | |
| seedWeight := 1.0 / float64(len(seeds)) | |
| personal := make(map[string]float64, len(seeds)) | |
| for _, s := range seeds { | |
| personal[s] = seedWeight | |
| if err := ctx.Err(); err != nil { | |
| return nil, err | |
| } | |
| // PersonalizedPageRank computes PPR scores seeded from the given entity IDs. | |
| // The random walk is restricted to the local neighborhood within maxHops of the | |
| // seeds so per-query work scales with the relevant subgraph rather than the | |
| // entire KB graph. alpha is the teleport probability (typically 0.15), maxIter | |
| // limits iterations, and epsilon is the convergence threshold. | |
| func (g *Graph) PersonalizedPageRank( | |
| ctx context.Context, | |
| seeds []string, | |
| maxHops int, | |
| alpha float64, | |
| maxIter int, | |
| epsilon float64, | |
| ) (map[string]float64, error) { | |
| if len(seeds) == 0 { | |
| return nil, nil | |
| } | |
| if err := ctx.Err(); err != nil { | |
| return nil, fmt.Errorf("ppr: context check: %w", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/graph/graph.go` around lines 392 - 410, The raw ctx.Err() returns in
PersonalizedPageRank and localPPRNeighborhoodLocked should be wrapped with
contextual error messages; replace direct returns of ctx.Err() (e.g., in
PersonalizedPageRank around the initial check and the later checks previously at
lines noted in the review, and in localPPRNeighborhoodLocked) with
fmt.Errorf("context: %w", err) so callers get descriptive context; search for
ctx.Err() usages inside the PersonalizedPageRank method and
localPPRNeighborhoodLocked helper and update each return to wrap the error using
fmt.Errorf("context: %w", err).
| g.mu.RLock() | ||
| nodes, neighbors, err := g.localPPRNeighborhoodLocked(ctx, seeds, maxHops) | ||
| g.mu.RUnlock() | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| for id := range g.reverse { | ||
| allNodes[id] = struct{}{} | ||
| if len(nodes) == 0 { | ||
| return nil, nil | ||
| } |
There was a problem hiding this comment.
Wrap error from localPPRNeighborhoodLocked.
The error returned from the helper should be wrapped with context to aid debugging.
Proposed fix
g.mu.RLock()
nodes, neighbors, err := g.localPPRNeighborhoodLocked(ctx, seeds, maxHops)
g.mu.RUnlock()
if err != nil {
- return nil, err
+ return nil, fmt.Errorf("ppr neighborhood: %w", err)
}As per coding guidelines: **/*.go: Always wrap errors with context using fmt.Errorf("context: %w", err). Never return naked errors.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| g.mu.RLock() | |
| nodes, neighbors, err := g.localPPRNeighborhoodLocked(ctx, seeds, maxHops) | |
| g.mu.RUnlock() | |
| if err != nil { | |
| return nil, err | |
| } | |
| for id := range g.reverse { | |
| allNodes[id] = struct{}{} | |
| if len(nodes) == 0 { | |
| return nil, nil | |
| } | |
| g.mu.RLock() | |
| nodes, neighbors, err := g.localPPRNeighborhoodLocked(ctx, seeds, maxHops) | |
| g.mu.RUnlock() | |
| if err != nil { | |
| return nil, fmt.Errorf("ppr neighborhood: %w", err) | |
| } | |
| if len(nodes) == 0 { | |
| return nil, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/graph/graph.go` around lines 412 - 420, The call to
g.localPPRNeighborhoodLocked returns an unwrapped error; update the error return
to wrap it with context using fmt.Errorf so callers get useful debug info (e.g.,
replace "return nil, err" with "return nil,
fmt.Errorf(\"localPPRNeighborhoodLocked failed: %w\", err)"). Ensure you import
fmt if needed and keep the surrounding lock/unlock and nil/empty-nodes behavior
unchanged; references: g.localPPRNeighborhoodLocked and the error-return site in
the function that holds g.mu.RLock/g.mu.RUnlock.
There was a problem hiding this comment.
Pull request overview
This PR optimizes Personalized PageRank (PPR) scoring in the search graph scorer by restricting computation to a seed-local subgraph (bounded by max_hops) and making the traversal cancelable via context.Context, with tests covering locality and cancellation behavior.
Changes:
- Thread
context.Contextthrough the PPR call path and return an error on cancellation. - Restrict PPR iterations to a BFS-computed local neighborhood within
MaxHops. - Add regression tests for neighborhood bounding and canceled contexts.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| internal/search/graph.go | Passes ctx + opts.MaxHops into graph PPR and wraps errors for search. |
| internal/graph/graph.go | Updates PPR API to be cancelable and localizes PPR to an induced neighborhood subgraph. |
| internal/graph/graph_test.go | Updates existing PPR tests and adds coverage for locality + cancellation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| nodeIDs := make([]string, 0, len(nodes)) | ||
| neighbors := make(map[string][]string, len(nodes)) | ||
| for id := range nodes { | ||
| nodeIDs = append(nodeIDs, id) | ||
| } |
There was a problem hiding this comment.
localPPRNeighborhoodLocked builds nodeIDs by iterating a map (for id := range nodes), which makes the subsequent PageRank iterations run over nodes in nondeterministic order. Because rank propagation uses floating-point accumulation, this can lead to run-to-run score jitter and potentially flaky tests. Consider sorting nodeIDs (e.g., sort.Strings) before returning/using it so PPR traversal order is deterministic.
| for _, edges := range [][]Edge{g.forward[cur], g.reverse[cur]} { | ||
| for _, e := range edges { | ||
| nodes[e.TargetID] = struct{}{} | ||
| if _, seen := dist[e.TargetID]; seen { | ||
| continue |
There was a problem hiding this comment.
In localPPRNeighborhoodLocked, cancellation is only checked every 256 queue pops, but the nested edge scan for a single high-degree node can still take a long time after the request is canceled. Consider adding a periodic ctx.Err() check inside the inner edge loops so cancellation latency is bounded by edge-scan work, not just queue progress.
There was a problem hiding this comment.
Addressed in e392a16. localPPRNeighborhoodLocked now checks ctx.Err() inside the BFS edge-scan loop so cancellation latency is bounded by edge-scan work, not just queue progress.
| for _, e := range g.reverse[id] { | ||
| newRank[e.TargetID] += share | ||
| share := rank[id] / float64(len(neighbors[id])) | ||
| for _, nextID := range neighbors[id] { |
There was a problem hiding this comment.
ctx.Err() is only checked every 256 nodes, but cancellation can still be delayed if a node has a very large neighbor list because the inner for _, nextID := range neighbors[id] loop has no checks. Consider adding a periodic cancellation check inside that inner loop so worst-case cancellation latency is bounded by edge-scan work too.
| for _, nextID := range neighbors[id] { | |
| for j, nextID := range neighbors[id] { | |
| if j%256 == 0 { | |
| if err := ctx.Err(); err != nil { | |
| return nil, err | |
| } | |
| } |
There was a problem hiding this comment.
Addressed in e392a16. Rank propagation now checks ctx.Err() periodically inside the neighbors[id] loop, so a high-degree node cannot delay cancellation until the full neighbor list has been scanned.
| for _, id := range nodeIDs { | ||
| localNeighbors := make([]string, 0, len(g.forward[id])+len(g.reverse[id])) | ||
| for _, edges := range [][]Edge{g.forward[id], g.reverse[id]} { | ||
| for _, e := range edges { |
There was a problem hiding this comment.
localPPRNeighborhoodLocked builds neighbors by scanning all edges for each node in the induced subgraph, but there are no ctx.Err() checks in this phase. If the neighborhood is large, cancellation may be delayed until after this full pass completes. Consider adding a periodic cancellation check in the loops that build localNeighbors as well.
| for _, id := range nodeIDs { | |
| localNeighbors := make([]string, 0, len(g.forward[id])+len(g.reverse[id])) | |
| for _, edges := range [][]Edge{g.forward[id], g.reverse[id]} { | |
| for _, e := range edges { | |
| for i, id := range nodeIDs { | |
| if i%256 == 0 { | |
| if err := ctx.Err(); err != nil { | |
| return nil, nil, err | |
| } | |
| } | |
| localNeighbors := make([]string, 0, len(g.forward[id])+len(g.reverse[id])) | |
| edgeCount := 0 | |
| for _, edges := range [][]Edge{g.forward[id], g.reverse[id]} { | |
| for _, e := range edges { | |
| if edgeCount%256 == 0 { | |
| if err := ctx.Err(); err != nil { | |
| return nil, nil, err | |
| } | |
| } | |
| edgeCount++ |
There was a problem hiding this comment.
Addressed in e392a16. The induced-subgraph neighbor builder now checks ctx.Err() while iterating nodeIDs and while scanning per-node edge lists.
|
Follow-up pushed in Addressed the review feedback by:
Validation stayed green with |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
internal/graph/graph.go (1)
409-417:⚠️ Potential issue | 🟡 MinorWrap the new PPR errors before returning them.
These paths still bubble up raw
ctx.Err()/ helper errors, so once the caller wraps them you still can't tell whether cancellation happened before BFS, during neighborhood discovery, or mid-iteration.🔧 Proposed fix
import ( "context" + "fmt" "math" "sort" "sync" "time" ) @@ if err := ctx.Err(); err != nil { - return nil, err + return nil, fmt.Errorf("personalized pagerank: initial context check: %w", err) } @@ if err != nil { - return nil, err + return nil, fmt.Errorf("personalized pagerank: build local neighborhood: %w", err) } @@ if err := ctx.Err(); err != nil { - return nil, nil, err + return nil, nil, fmt.Errorf("local ppr neighborhood: bfs context check: %w", err) }As per coding guidelines:
**/*.go: Always wrap errors with context usingfmt.Errorf("context: %w", err). Never return naked errors.Also applies to: 447-449, 457-459, 467-469, 478-480, 518-520, 530-532, 552-554, 561-563
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/graph/graph.go` around lines 409 - 417, The function currently returns raw errors from ctx.Err() and helper calls (e.g., the ctx.Err() check and the g.localPPRNeighborhoodLocked(...) call); update each return to wrap the underlying error with context using fmt.Errorf, e.g. fmt.Errorf("checking context before PPR neighborhood: %w", err) and fmt.Errorf("localPPRNeighborhoodLocked failed: %w", err). Apply the same wrapping pattern to the other similar return sites referenced (the other ctx.Err() and helper error returns) so all naked errors are wrapped with clear contextual messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/graph/graph.go`:
- Around line 461-465: The loop drops probability mass when a node has no
neighbors (e.g., isolated seed or maxHops==0); instead of "continue", detect
dangling nodes (len(neighbors[id])==0) and redistribute that node's rank into
the personalization vector (personal[id] += rank[id]) or treat it as a self-loop
(compute share = rank[id] and add to nextRank[id]) before the
neighbor-distribution step so total mass is conserved; apply the same fix at the
other occurrence around the code that currently checks len(neighbors[id]) (the
noted spots at the later block) to ensure no localized sinks leak probability.
---
Duplicate comments:
In `@internal/graph/graph.go`:
- Around line 409-417: The function currently returns raw errors from ctx.Err()
and helper calls (e.g., the ctx.Err() check and the
g.localPPRNeighborhoodLocked(...) call); update each return to wrap the
underlying error with context using fmt.Errorf, e.g. fmt.Errorf("checking
context before PPR neighborhood: %w", err) and
fmt.Errorf("localPPRNeighborhoodLocked failed: %w", err). Apply the same
wrapping pattern to the other similar return sites referenced (the other
ctx.Err() and helper error returns) so all naked errors are wrapped with clear
contextual messages.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| if len(neighbors[id]) == 0 { | ||
| continue | ||
| } | ||
| for _, e := range g.reverse[id] { | ||
| newRank[e.TargetID] += share | ||
| share := rank[id] / float64(len(neighbors[id])) | ||
| for j, nextID := range neighbors[id] { |
There was a problem hiding this comment.
Localized sinks currently leak probability mass.
Restricting the walk to the local subgraph can create sinks, e.g. maxHops == 0 or an isolated seed. The current continue path drops that mass, so a seed-only neighborhood converges to alpha instead of 1.0. Redistribute dangling mass through personal (or add self-loops) before the update step; a small maxHops == 0 regression test would catch this.
🧮 Proposed fix
+ danglingMass := 0.0
for i, id := range nodes {
if i%256 == 0 {
if err := ctx.Err(); err != nil {
return nil, err
}
}
if len(neighbors[id]) == 0 {
+ danglingMass += rank[id]
continue
}
share := rank[id] / float64(len(neighbors[id]))
@@
- nr := alpha*personal[id] + (1-alpha)*newRank[id]
+ nr := alpha*personal[id] + (1-alpha)*(newRank[id]+danglingMass*personal[id])Also applies to: 482-483
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/graph/graph.go` around lines 461 - 465, The loop drops probability
mass when a node has no neighbors (e.g., isolated seed or maxHops==0); instead
of "continue", detect dangling nodes (len(neighbors[id])==0) and redistribute
that node's rank into the personalization vector (personal[id] += rank[id]) or
treat it as a self-loop (compute share = rank[id] and add to nextRank[id])
before the neighbor-distribution step so total mass is conserved; apply the same
fix at the other occurrence around the code that currently checks
len(neighbors[id]) (the noted spots at the later block) to ensure no localized
sinks leak probability.
Summary
max_hopsValidation
Closes #3
Summary by CodeRabbit
New Features
Bug Fixes
Performance
Tests