Skip to content

feat: GraphQL SDL extraction + federation, operation→resolver & call-site graph links#1438

Open
daniil-kzn wants to merge 3 commits into
Graphify-Labs:v8from
daniil-kzn:feat/graphql-sdl-extractor
Open

feat: GraphQL SDL extraction + federation, operation→resolver & call-site graph links#1438
daniil-kzn wants to merge 3 commits into
Graphify-Labs:v8from
daniil-kzn:feat/graphql-sdl-extractor

Conversation

@daniil-kzn

@daniil-kzn daniil-kzn commented Jun 23, 2026

Copy link
Copy Markdown

What / why

graphify's structural extraction is tree-sitter based, and there is no
tree-sitter grammar for GraphQL. As a result .graphqls / .graphql schema
files are silently skipped — the types, inputs and mutations defined in SDL
never enter the graph — and the call sites that invoke those operations live
inside gql...`` / graphql:"..." string literals, which tree-sitter
indexes as opaque text. So the GraphQL contract layer, first-class structure in
a lot of codebases (gqlgen, Apollo, federation), is invisible to the graph.

This adds GraphQL support along the existing extractor seams: an SDL extractor
wired through _get_extractor, a call-site extractor folded into the per-file
code extraction, plus graph-stitching passes that connect the new nodes to the
code and across repos. There is zero behavior change unless GraphQL is present
— every new pass early-returns when no gql_* nodes exist and the call-site
scan yields nothing for files without a GraphQL literal.

The end result closes a loop graphify couldn't close before: from a backend
operation, traverse to every consumer across repos — frontend documents and
Go service clients alike — that a change to it would affect.

Design

1. SDL extractor — graphify/graphql_sdl.py (new)

  • Parses each schema file with graphql-core into the standard per-file
    {"nodes": [...], "edges": [...]} shape, wired in via _get_extractor
    (extract.py) + .graphqls / .graphql added to CODE_EXTENSIONS.
  • Emits structured nodes (not a plain-text sidecar): gql_type, gql_input,
    gql_interface, gql_enum (+ values), gql_scalar, gql_union, fields, and
    root Mutation/Query fields as gql_operation. Apollo Federation type X @key
    is tagged gql_entity (federation=entity), extend type X @key as
    federation=extends.
  • Edges: type --contains--> field, field --references--> named type,
    operation --references--> input, operation --returns--> return type.

2. Operation → resolver links + dedup — graphify/extract.py

  • _consolidate_gql_duplicates: SDL nodes use name-keyed ids (gql_<name>) so a
    type split across files collapses to one node; a federated owner outranks a
    plain stub.
  • _link_gql_operations_to_resolvers: matches a gql_operation to the resolver
    function that implements it by normalized name, bridging schema ↔ code.

3. Operation call sitesgraphify/graphql_calls.py (new) + extract.py

  • find_gql_operation_calls scans a source file for where code invokes an
    operation: the root selections of gql... / `graphql`... tagged
    template literals in TS/JS (a mutation's root field is the operation it calls),
    and the operation named in a Go graphql:"..." struct tag. Nested fields,
    aliases, inline-object arguments and fragment spreads are skipped, so the
    result is the operations the document actually calls — no string-content
    guessing, no graphql-core dependency.
  • _compose_with_gql_calls folds this into the per-file code extractor returned
    by _get_extractor, so each gql_call node is cached and incrementally
    updated exactly like the AST nodes
    , and is anchored to the nearest enclosing
    symbol with a references edge.
  • _link_gql_calls_to_operations: per-repo, links a gql_call to the
    gql_operation of the same name (a service calling its own operation).

4. Cross-repo stitch — graphify/global_graph.py

  • _stitch_federation: same entity name across two repos is the same federated
    entity, so each extend-side reference gets a federation_key edge to the
    owning service.
  • _stitch_gql_calls: a gql_call in one repo (e.g. a frontend) links with a
    calls edge to the gql_operation of the same name in the repo that defines
    it (e.g. a backend) — the cross-repo frontend→mutation link.
  • Both are idempotent (prior edges dropped first) and re-run on every
    global_add.

Footprint / safety

  • Lazy graphql-core import keeps the SDL pass optional: missing lib → no-op;
    malformed schema → error marker instead of raising (one bad file can't abort
    a run). The call-site scan is pure-text and dependency-free.
  • Adds graphql-core>=3.2,<4. build.py is unchanged. AST-cache salt bumped so
    existing caches regenerate against the new extractor output.

Tests

tests/test_graphql_sdl.py (types/inputs/enums/operations, op→input/return
edges, @key entity tagging, malformed-input safety),
tests/test_gql_federation.py (consolidation, op→resolver matching, cross-repo
federation stitch + idempotency), and tests/test_gql_calls.py (root-selection
parse with args/aliases/inline-object args/fragments, Go tag parse, gql_call node
shape, per-repo and cross-repo call→operation linking + idempotency). All green;
full suite passes.

Scope note

Intentionally narrow: GraphQL is one structured contract you already have in the
repo, wired through existing seams, mirroring the merged --cargo (#1271) and
PowerShell .psd1 (#1341) extractors. It does not broaden graphify into a
general ingestion layer (no new doc formats, archives, or network fetching) and
does not turn SDL or gql literals into opaque text — it produces the same kind of
structured, code-linked graph graphify already builds. The four passes are
additive and independently reviewable; happy to split them into separate PRs if
you'd prefer.

daniil-kzn and others added 2 commits June 24, 2026 17:38
graphify has no tree-sitter grammar for GraphQL, so .graphqls schema files
are silently skipped — types, inputs and mutations defined in SDL never enter
the graph. This adds a focused, graphql-core based per-file extractor wired
through the existing `_get_extractor` dispatch (same mechanism as the
blade/mcp/manifest special cases). Zero behavior change unless a .graphqls /
.graphql file is present.

Emits structured nodes (not a plain-text sidecar): types, inputs, interfaces,
enums (+ values), scalars, unions, their fields, and root Mutation/Query
fields as operations. Edges:

  type      --contains-->   field
  field     --references--> field's named type
  operation --references--> argument input type
  operation --returns-->    return type

- New module graphify/graphql_sdl.py; adds graphql-core dependency.
- Degrades to a no-op if graphql-core is unavailable; malformed schemas
  return an error marker instead of raising.
- tests/test_graphql_sdl.py covers types/inputs/enums/operations, the
  operation->input/return edges, and malformed-input safety.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@daniil-kzn daniil-kzn force-pushed the feat/graphql-sdl-extractor branch from a2258de to e1b8f62 Compare June 24, 2026 14:38
@daniil-kzn daniil-kzn changed the title feat(extract): add GraphQL SDL (.graphqls/.graphql) extractor feat: GraphQL SDL extraction + federation/operation graph links Jun 24, 2026
Adds a call-site extractor that captures where code *invokes* a GraphQL
operation — gql`...` / graphql`...` tagged template literals in TS/JS and
graphql:"..." struct tags in Go — which tree-sitter indexes as opaque string
literals and the SDL pass therefore can't reach. Each call site becomes a
`gql_call` node; a per-repo pass links same-repo calls and a global stitch links
cross-repo calls to the owning `gql_operation` by name, so a frontend's call to a
backend mutation becomes a real edge.

This closes the loop opened by the SDL extractor: with calls linked, a query for
a backend operation surfaces every consumer it would affect across repos
(frontend documents and Go service clients alike).

- graphify/graphql_calls.py: pure scanner (root-selection parse for TS, tag
  parse for Go) + gql_call node builder. graphql-core not required.
- graphify/extract.py: fold call-site extraction into the per-file code
  extractor (so it caches/incrementally updates like AST) + per-repo
  call->operation linking.
- graphify/global_graph.py: _stitch_gql_calls — idempotent cross-repo linking,
  re-run on every global_add alongside _stitch_federation.
- graphify/cache.py: bump extractor salt so the AST cache regenerates.
- tests/test_gql_calls.py: scanner edge cases (args, aliases, inline-object
  args, fragments, Go tags) + per-repo and cross-repo linking + idempotency.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@daniil-kzn daniil-kzn changed the title feat: GraphQL SDL extraction + federation/operation graph links feat: GraphQL SDL extraction + federation, operation→resolver & call-site graph links Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants