Skip to content

Lineage: binding-level flow + usage analysis (model) #2

@JonathanTurnock

Description

@JonathanTurnock

Part of #1 — data lineage canvas. The substantive new work.

Goal

New pure analysis crates/pseudoscript-model/src/lineage.rs: Lineage::for_symbol(workspace, data_fqn) -> Lineage, mirroring how Graph::build walks the AST (graph.rs:289).

pub struct Lineage {
    pub of: String,
    pub flows: Vec<FlowEdge>,   // from_data_fqn -> to_data_fqn, with the callable + span it occurs in
    pub usages: Vec<Usage>,
}
pub struct FlowEdge { pub from: String, pub to: String, pub via_callable: String /* + span */ }
pub enum UsageRole { ReturnType, Param, Field, FromTarget, FromSource }
pub struct Usage { pub owner_fqn: String, pub role: UsageRole, pub line: u32, pub col: u32 }

Why

Today provenance edges only connect resolved node/data FQNs (graph.rs:627); a bare from source like raw makes expr_node_fqn (graph.rs:812) return None, dropping the edge. The flattened Step bodies (graph.rs:208) don't keep binding names. Binding-level analysis over the AST is required.

Flow edges (the lineage tree)

Walk every Callable body (reuse the trace_block/trace_stmt traversal shape, graph.rs:533). Maintain a per-body bindings: HashMap<name, &Expr> from each StmtKind::Assign. On ExprKind::From { ty, sources }, resolve each source to its originating data type:

  • bare binding raw → look up bindings["raw"] → resolve recursively;
  • call X.Method() → resolve X via resolve_path, take that method's return_ty (strip Result/Option/[]);
  • field access req.field → type of binding/param req (params from the enclosing Callable.params).

Emit FlowEdge { from: upstream_data_fqn, to: resolve_path(ty) }. Cap recursion depth, guard cycles.

Usages

Reuse the references engine logic (resolve::resolve_at, mirrored at crates/pseudoscript-wasm/src/lib.rs:445): scan identifier tokens that resolve to the data symbol; classify each by syntactic role. Keep it in model so it is reusable off-wasm.

Wiring

Add pub mod lineage; to crates/pseudoscript-model/src/lib.rs. Build lineage in Graph::build and expose graph.lineage(fqn) so emit stays a pure graph→scene projector.

Tests

  • raw = X.Fetch(); account = T from { raw }FlowEdge { from: "..AccountRecord", to: "..AccountData" } + Usage{FromTarget} at the composition site.
  • transitive: a = T1 from {..}; b = T2 from { a } → chain.
  • param, field, and cycle-guard cases.
  • cargo test -p pseudoscript-model.

Rust authoring: use the idiomatic-rust skill.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions