Skip to content

russlank/lang-forge

Repository files navigation

LangForge

Latest Release License: MIT Buy Me A Coffee

LangForge is a scanner/parser generator for building DSLs, validators, transpilers, compilers, code generators, and language tooling from one readable .lf grammar file.

Write the scanner and parser once, then generate target-native code for:

  • Go
  • C#
  • C
  • C++

LangForge is designed around typed reducer APIs, multi-target generation, parser recovery, expected-token diagnostics, and a clean separation between generated automata and handwritten application logic.

Why LangForge?

When you build a small language or DSL, you usually need to:

  1. split source text into tokens;
  2. check that the token sequence follows a grammar;
  3. turn recognized syntax into useful application objects;
  4. report helpful syntax errors;
  5. keep generated parsing code separate from handwritten business logic.

LangForge generates the scanner and parser machinery. Your code owns the meaning: AST nodes, reducers, validation, compilation, interpretation, rendering, reporting, or whatever else your language is meant to do.

What Can You Build With It?

LangForge is useful for:

  • small DSLs;
  • configuration languages;
  • validators;
  • code generators;
  • transpilers;
  • educational compilers;
  • report or query languages;
  • parser experiments;
  • language-tooling prototypes.

The repository includes calculator expressions, a DataKeeper-style scripting DSL, the DRAW rendering language, vehicle-report parsing, parser recovery demos, mini-compiler templates, reusable library-style DSL templates, a modern C# layered compiler template, and a layered modern C++ compiler template with a parser facade and CMake build.

Typed Reducers Instead Of Stack Indexing

A grammar rule can label right-hand-side values:

%semantic go Expr float64
%semantic go Term float64

Expr
  : left=Expr Plus right=Term {go: add}
  | value=Term {go: pass}
  ;

LangForge uses those labels and semantic type declarations to generate typed reducer contexts:

reducers := parser.ReducerMap{
	parser.SemanticActionAdd: parser.TypedAdd(
		func(ctx parser.AddReduction) (float64, error) {
			return ctx.Left + ctx.Right, nil
		},
	),
}

Handwritten reducer code can use ctx.Left and ctx.Right instead of manual positions such as ctx.Values[0] and ctx.Values[2].

Pipeline

source text
  -> generated scanner / token source
  -> generated LR parser
  -> typed reducer
  -> AST / model / command / report
  -> compiler / interpreter / renderer / validator

The preferred production path is pull-based and lazy: a generated scanner feeds tokens to the generated parser as the parser asks for them. Collection-based token APIs are still available for debugging, teaching, tests, and token-stream inspection.

Highlights

  • One .lf file for scanner and parser definitions.
  • Generated scanner/parser code for Go, C#, C, and C++.
  • LR parser modes: SLR(1), LALR(1), IELR(1), and canonical LR(1).
  • Named grammar values such as left=Expr and right=Term.
  • Typed reducer contexts/adapters instead of manual parser-stack indexing.
  • Pull-based token-source parsing for lazy scanner-to-parser pipelines.
  • Parser error recovery with expected-token diagnostics.
  • Deterministic langforge.actions.json action manifests.
  • Copyable mini-compiler, library-style DSL, and modern C#/C++ layered templates.
  • Example parity gates for cross-target grammar and semantic-contract drift.

Quick Start

Validate a grammar:

go run ./cmd/lang-forge validate --spec examples/go/calc/calc.lf

Inspect parser tables:

go run ./cmd/lang-forge inspect --spec examples/go/calc/calc.lf --format text

Run a Go example:

make -C examples/go/calc run

Run a reusable library-style template:

make -C examples/templates/go/library-dsl test

If go is not on your PATH, use the full path to your Go toolchain. The current development workspace uses /usr/local/go/bin/go.

Choose Your Starting Point

Goal Start here
Learn the basics examples/go/calc
Build a small compiler pipeline examples/templates/go/mini-compiler
Build a reusable DSL library examples/templates/go/library-dsl
Build a layered C# compiler facade examples/templates/csharp/layered-compiler
Build a layered C++ compiler facade examples/templates/cpp/layered-compiler
See parser recovery examples/go/parser-recovery
See a renderer-style language examples/go/draw
Compare target languages examples
Understand generated semantics doc/generated-code-and-semantics.md
Understand handwritten integration doc/handwritten-integration-guide.md

Generated Targets

Target Generated output Semantic API Notes
Go tokens.go, scanner.go, parser.go typed reducer contexts, reducer maps primary workflow and richest examples
C# Tokens.g.cs, Scanner.g.cs, Parser.g.cs typed reducer contexts, action enums nullable-aware .g.cs output
C tokens.h, scanner.h/.c, parser.h/.c, parser_typed.h typed reducer structs, function pointers reentrant APIs and explicit ownership
C++ tokens.hpp, scanner.hpp/.cpp, parser.hpp/.cpp, parser_typed.hpp typed adapters and reducer maps C++17 output

All targets also write deterministic manifest files, including langforge.actions.json, so examples and downstream projects can verify the semantic contract produced from a grammar.

Examples And Templates

examples are runnable projects that show LangForge in several language families:

examples/templates are copyable starting points. The mini-compiler templates show a small front end, stack-machine lowering, and mock execution. The library-style DSL templates hide generated parser details behind a stable domain API, which is the recommended shape for real tools. The C# layered compiler template shows Ast/, Semantics/, Parsing/, a public IMiniCompilerParser, domain ParseResult<T>, and DI-friendly semantic policy injection. The C++ layered compiler template goes one step further with public headers under include/, generated output isolated under generated/, direct typed reducers, intentional std::unique_ptr/std::variant ownership, a domain parser facade, and CMake integration.

Parser Algorithms

LangForge builds LR parser automata and reports conflicts with source spans where possible. LALR(1) is the default because it is compact and familiar, but grammars can select other algorithms:

  • %type slr for SLR(1);
  • %type lalr for LALR(1);
  • %type ielr for IELR(1);
  • %type canonical for canonical LR(1).

LR(0) item sets are part of the implementation and documentation, even when the selected parser table uses a lookahead-aware algorithm. See Parser algorithms for worked examples, automata shape, conflict behavior, and when to choose each mode.

Error Recovery And Diagnostics

Generated parsers support grammar-directed recovery with reserved error productions, synchronization terminals, expected-token aliases and groups, partial results, and structured diagnostics. Recovery examples are available for all generated targets:

See Parser error recovery for the grammar patterns and generated APIs.

Install Or Update

Install the latest release binary with curl:

curl -fsSL https://github.com/russlank/lang-forge/releases/latest/download/install-lang-forge.sh | sh

Or with wget:

wget -qO- https://github.com/russlank/lang-forge/releases/latest/download/install-lang-forge.sh | sh

The installer detects the supported OS/architecture pair, downloads the matching release binary, verifies SHA256SUMS, and installs lang-forge to ${PREFIX:-/usr/local}/bin. Use a user-writable directory when you do not want sudo:

curl -fsSL https://github.com/russlank/lang-forge/releases/latest/download/install-lang-forge.sh \
  | LANG_FORGE_INSTALL_DIR="$HOME/.local/bin" sh

Set LANG_FORGE_REPO_URL when installing from a fork or mirror that publishes the same release asset names.

From a source checkout, you can also run the tool directly:

go run ./cmd/lang-forge version

Requirements

The core tool needs Go 1.26.4 or a compatible newer toolchain plus make. The full example and CI suite also needs the .NET 10.0 SDK for C# examples, GCC or another C11 compiler for C examples and Go race tests, and a C++17 compiler for C++ examples.

See Requirements for the complete toolchain matrix and target-specific notes.

Running Examples

The main demos exist in Go, C#, C, and C++:

Example Go C# C C++
Calculator make -C examples/go/calc run make -C examples/csharp/calc run make -C examples/c/calc run make -C examples/cpp/calc run
DataKeeper DSL make -C examples/go/datakeeper run make -C examples/csharp/datakeeper run make -C examples/c/datakeeper run make -C examples/cpp/datakeeper run
DRAW renderer make -C examples/go/draw run make -C examples/csharp/draw run make -C examples/c/draw run make -C examples/cpp/draw run
Vehicle report make -C examples/go/vehicle-report run make -C examples/csharp/vehicle-report run make -C examples/c/vehicle-report run make -C examples/cpp/vehicle-report run
Parser recovery make -C examples/go/parser-recovery run make -C examples/csharp/parser-recovery run make -C examples/c/parser-recovery run make -C examples/cpp/parser-recovery run

Example Makefiles run LangForge from source by default with go run ../../../cmd/lang-forge. After building a standalone utility, the same examples can use it:

make build
make -C examples/go/calc LANG_FORGE=../../../dist/lang-forge run

Example Makefiles default to LANG_FORGE_VERBOSITY=1, so generation prints major LangForge stages on stderr. Use LANG_FORGE_VERBOSITY=0 for quiet runs, or LANG_FORGE_VERBOSITY=2/3 while debugging grammars and parser tables.

Generated example output is intentionally ignored. Use these commands to run the suite and confirm the examples return to source-only form:

make examples-test
make examples-run
make examples-cleanliness

If you do not want to install a binary, a local Docker image can be used as the LangForge command:

make docker-build
docker run --rm -v "$PWD:/workspace:ro" -w /workspace lang-forge:dev \
  validate --spec examples/go/calc/calc.lf

Build, CI, Release, And Docker

Build, CI, release, and Docker targets are available through the root Makefile:

make ci
make fuzz-smoke
make golden-stability
make examples-testdata
make examples-templates
make dist VERSION=0.1.0
make docker-build
make docker-smoke

See Build, pipeline, and Docker and Invocation and layout patterns for CI, release artifacts, Docker usage, Makefile patterns, and multi-parser project layouts.

Documentation

Current Status And Limits

Implemented

  • Combined .lf specification parsing.
  • Legacy split .l plus .y parsing for curated UCDT-derived regression fixtures.
  • Regex parsing, character-class partitioning, NFA-to-DFA construction, and DFA minimization.
  • LR(0), SLR, LALR(1), IELR(1), and canonical LR(1) parser-table construction with conflict reporting.
  • CLI commands: version, validate, inspect, and generate.
  • Optional CLI verbosity for validation, generation, automata decisions, and parser-table traces.
  • Named RHS labels, target-specific semantic type declarations, generated typed reducer contexts/adapters, and reducer coverage validation.
  • Deterministic langforge.actions.json, langforge.manifest.json, and langforge.tables.json files.
  • Example parity gates comparing grammar shape and action-manifest contracts across Go, C#, C, and C++.
  • Generated Go, C#, C, and C++ scanner/parser backends with UTF-8 checking, reducer hooks, semantic action IDs/enums, and token-source parsing.
  • Validation for empty-matching lexer rules, token/nonterminal name collisions, parser conflicts, invalid Unicode scalar ranges, and unsupported scanner settings.
  • Grammar-directed parser recovery with expected-token diagnostics and cross-target recovery APIs.
  • Language-grouped examples and copyable templates for Go, C#, C, and C++.

Planned

  • Additional source encodings beyond checked UTF-8.
  • More debug tracing and developer-facing automata explanations.
  • Optional AST helper generation.
  • Additional parity checks and reusable templates as the examples mature.

Compatibility Notes

  • LALR(1) is the default parser algorithm. SLR, IELR(1), and canonical LR(1) can be selected with %type slr, %type ielr, or %type canonical.
  • Scanners default to checked UTF-8 and sparse Unicode scalar ranges for the in-process engine plus generated Go, C#, C, and C++ output. See Scanner encoding architecture.
  • Pull-based token sources are the preferred production API. Older collection APIs such as Tokenize, All, Parse(tokens, ...), and target equivalents remain supported for tests, debugging, token reports, and simple examples.
  • Specs can use reducer callbacks with generated action IDs/enums across all targets. Go also has an advanced inline action mode for projects that need target-tagged semantic imports.

Background / UCDT Inspiration

LangForge was inspired by the older Pascal UCDT project. UCDT is useful historical context and influenced the starting point, but LangForge is not trying to preserve DOS-era compatibility. The current design uses a target-neutral core, modern generated APIs, typed reducers, cross-target examples, and public documentation intended to work as compiler-learning material.

Agent Skills

Reusable Codex skills for LangForge live under skills:

  • langforge-spec-authoring for .lf and legacy .l/.y grammar work.
  • langforge-example-runner for generated example projects and demo runs.
  • langforge-project-steward for reviews, hardening, and project-memory updates when private notes are present.

License

LangForge is released under the MIT License.