Skip to content

Experimental: heuristic dynamic-dispatch edge synthesis for long-tail unresolved call sites #687

@luojiyin1987

Description

@luojiyin1987

Summary

callback-synthesizer.ts bridges dynamic dispatch via hand-written regex rules — it catches field-backed observers, string-keyed EventEmitters, setState→render, JSX child passing, Vue handlers, gRPC stub→impl, etc. Each new pattern requires a PR — doesn't scale to ad-hoc patterns (custom decorators, custom event buses, custom IoC containers, factory-based dispatch).

Proposal: add an optional heuristic synthesis pass for syntactically suspicious call sites that static analysis can't resolve. Not "all functions with zero outgoing edges" — only patterns that are known static blind spots: computed member calls, getattr, decorator registration, string-keyed emit/dispatch, factory registration, proxy-like access, container lookup. The heuristic backend can be rule-based, LLM-assisted, or both — the graph only cares about the resulting edges.

Gated behind an optional config/env var; disabled by default; zero-impact for users who don't opt in.

Motivation (the gap)

From docs/design/dynamic-dispatch-coverage-playbook.md:

codegraph's value is being the map — answering structural/flow questions that grep/Read cannot. Agents will use codegraph instead of Read only when it is sufficient.

The deterministic synthesizer has hard gaps:

  1. Ad-hoc user patterns — custom decorators, event buses, IoC containers, factory dispatch
  2. Static blind spotsgetattr(obj, name) / __call__ / metaclasses in Python, Proxy / obj[key]() in JS, computed property access
  3. Framework internals — even resolved frameworks (FastAPI, Laravel) only cover top-level route→handler; middleware chains, DI wiring remain opaque
  4. Every new framework costs a PR — doesn't scale

Proposed design (experimental v1 scope)

Input — narrow, not broad

Only inspect syntactically suspicious unresolved dynamic-dispatch forms:

  • computed member calls (obj[key](), getattr(obj, name))
  • decorator registration (@router.action() with no resolved handler)
  • string-keyed emit/dispatch (emit('event', ...), dispatch('action'))
  • factory registration (container.register(key, handler))
  • proxy-like access patterns
  • framework/container lookup calls

A leaf function with zero outgoing calls edges is NOT a signal — it's just a leaf.

Output

{
  source: callerNodeId,
  target: calleeNodeId,
  kind: 'calls',
  provenance: 'heuristic',
  metadata: {
    synthesizedBy: 'heuristic',
    method: 'llm | rule | hybrid',
    confidence: 0.85,
    registeredAt: '<filePath>:<line>'
  }
}

Configuration

CODEGRAPH_HEURISTIC_RESOLUTION=1
CODEGRAPH_HEURISTIC_BACKEND=openai-compatible  # or rule-based for deterministic-only
CODEGRAPH_HEURISTIC_ENDPOINT=http://localhost:11434/v1
CODEGRAPH_HEURISTIC_MODEL=gpt-4o-mini
CODEGRAPH_HEURISTIC_API_KEY=sk-...

All unset → no-op. Deterministic synthesis unchanged.

Constraints

  • Batched: collapse multiple unresolved sites into one request
  • Cached per file: reuse results when file hasn't changed
  • Conservative: confidence threshold > configurable cutoff
  • No full-body upload in v1: only call-site text + candidate symbol names

Tooling surface

  • codegraph status shows whether heuristic resolution is active
  • codegraph_explore Flow section labels heuristic edges distinctly
  • No other UI changes in v1

Prior art in this codebase

  • src/resolution/callback-synthesizer.ts — existing deterministic synthesizer; same Edge[] contract
  • docs/design/callback-edge-synthesis.md and docs/design/dynamic-dispatch-coverage-playbook.md

Non-goals

  • Full-file upload to an external service
  • Semantic search over code embeddings
  • Replacing the deterministic synthesizer
  • Runtime trace injection
  • Multi-provider support in v1 (start with one OpenAI-compatible endpoint)
  • Guaranteed trust by downstream tools (edges must be explicitly requested)

Success criteria

  1. Custom EventBus emit→handler flow surfaces in codegraph_explore
  2. Python custom decorator dispatch surfaces
  3. Node count stable; edges capped at MAX_CALLBACKS_PER_CHANNEL (40)
  4. CODEGRAPH_HEURISTIC_RESOLUTION unset → byte-identical output to before

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions