LangForge is a scanner/parser generator for building DSLs, validators,
transpilers, compilers, code generators, and language tooling from one readable
.lf grammar file.
Write the scanner and parser once, then generate target-native code for:
- Go
- C#
- C
- C++
LangForge is designed around typed reducer APIs, multi-target generation, parser recovery, expected-token diagnostics, and a clean separation between generated automata and handwritten application logic.
When you build a small language or DSL, you usually need to:
- split source text into tokens;
- check that the token sequence follows a grammar;
- turn recognized syntax into useful application objects;
- report helpful syntax errors;
- keep generated parsing code separate from handwritten business logic.
LangForge generates the scanner and parser machinery. Your code owns the meaning: AST nodes, reducers, validation, compilation, interpretation, rendering, reporting, or whatever else your language is meant to do.
LangForge is useful for:
- small DSLs;
- configuration languages;
- validators;
- code generators;
- transpilers;
- educational compilers;
- report or query languages;
- parser experiments;
- language-tooling prototypes.
The repository includes calculator expressions, a DataKeeper-style scripting DSL, the DRAW rendering language, vehicle-report parsing, parser recovery demos, mini-compiler templates, reusable library-style DSL templates, a modern C# layered compiler template, and a layered modern C++ compiler template with a parser facade and CMake build.
A grammar rule can label right-hand-side values:
%semantic go Expr float64
%semantic go Term float64
Expr
: left=Expr Plus right=Term {go: add}
| value=Term {go: pass}
;
LangForge uses those labels and semantic type declarations to generate typed reducer contexts:
reducers := parser.ReducerMap{
parser.SemanticActionAdd: parser.TypedAdd(
func(ctx parser.AddReduction) (float64, error) {
return ctx.Left + ctx.Right, nil
},
),
}Handwritten reducer code can use ctx.Left and ctx.Right instead of manual
positions such as ctx.Values[0] and ctx.Values[2].
source text
-> generated scanner / token source
-> generated LR parser
-> typed reducer
-> AST / model / command / report
-> compiler / interpreter / renderer / validator
The preferred production path is pull-based and lazy: a generated scanner feeds tokens to the generated parser as the parser asks for them. Collection-based token APIs are still available for debugging, teaching, tests, and token-stream inspection.
- One
.lffile for scanner and parser definitions. - Generated scanner/parser code for Go, C#, C, and C++.
- LR parser modes: SLR(1), LALR(1), IELR(1), and canonical LR(1).
- Named grammar values such as
left=Exprandright=Term. - Typed reducer contexts/adapters instead of manual parser-stack indexing.
- Pull-based token-source parsing for lazy scanner-to-parser pipelines.
- Parser error recovery with expected-token diagnostics.
- Deterministic
langforge.actions.jsonaction manifests. - Copyable mini-compiler, library-style DSL, and modern C#/C++ layered templates.
- Example parity gates for cross-target grammar and semantic-contract drift.
Validate a grammar:
go run ./cmd/lang-forge validate --spec examples/go/calc/calc.lfInspect parser tables:
go run ./cmd/lang-forge inspect --spec examples/go/calc/calc.lf --format textRun a Go example:
make -C examples/go/calc runRun a reusable library-style template:
make -C examples/templates/go/library-dsl testIf go is not on your PATH, use the full path to your Go toolchain. The
current development workspace uses /usr/local/go/bin/go.
| Goal | Start here |
|---|---|
| Learn the basics | examples/go/calc |
| Build a small compiler pipeline | examples/templates/go/mini-compiler |
| Build a reusable DSL library | examples/templates/go/library-dsl |
| Build a layered C# compiler facade | examples/templates/csharp/layered-compiler |
| Build a layered C++ compiler facade | examples/templates/cpp/layered-compiler |
| See parser recovery | examples/go/parser-recovery |
| See a renderer-style language | examples/go/draw |
| Compare target languages | examples |
| Understand generated semantics | doc/generated-code-and-semantics.md |
| Understand handwritten integration | doc/handwritten-integration-guide.md |
| Target | Generated output | Semantic API | Notes |
|---|---|---|---|
| Go | tokens.go, scanner.go, parser.go |
typed reducer contexts, reducer maps | primary workflow and richest examples |
| C# | Tokens.g.cs, Scanner.g.cs, Parser.g.cs |
typed reducer contexts, action enums | nullable-aware .g.cs output |
| C | tokens.h, scanner.h/.c, parser.h/.c, parser_typed.h |
typed reducer structs, function pointers | reentrant APIs and explicit ownership |
| C++ | tokens.hpp, scanner.hpp/.cpp, parser.hpp/.cpp, parser_typed.hpp |
typed adapters and reducer maps | C++17 output |
All targets also write deterministic manifest files, including
langforge.actions.json, so examples and downstream projects can verify the
semantic contract produced from a grammar.
examples are runnable projects that show LangForge in several language families:
examples/templates are copyable starting points. The
mini-compiler templates show a small front end, stack-machine lowering, and
mock execution. The library-style DSL templates hide generated parser details
behind a stable domain API, which is the recommended shape for real tools. The
C# layered compiler template shows Ast/, Semantics/, Parsing/, a public
IMiniCompilerParser, domain ParseResult<T>, and DI-friendly semantic
policy injection. The C++ layered compiler template goes one step further with
public headers under include/, generated output isolated under generated/,
direct typed reducers, intentional std::unique_ptr/std::variant ownership,
a domain parser facade, and CMake integration.
LangForge builds LR parser automata and reports conflicts with source spans where possible. LALR(1) is the default because it is compact and familiar, but grammars can select other algorithms:
%type slrfor SLR(1);%type lalrfor LALR(1);%type ielrfor IELR(1);%type canonicalfor canonical LR(1).
LR(0) item sets are part of the implementation and documentation, even when the selected parser table uses a lookahead-aware algorithm. See Parser algorithms for worked examples, automata shape, conflict behavior, and when to choose each mode.
Generated parsers support grammar-directed recovery with reserved error
productions, synchronization terminals, expected-token aliases and groups,
partial results, and structured diagnostics. Recovery examples are available
for all generated targets:
See Parser error recovery for the grammar patterns and generated APIs.
Install the latest release binary with curl:
curl -fsSL https://github.com/russlank/lang-forge/releases/latest/download/install-lang-forge.sh | shOr with wget:
wget -qO- https://github.com/russlank/lang-forge/releases/latest/download/install-lang-forge.sh | shThe installer detects the supported OS/architecture pair, downloads the
matching release binary, verifies SHA256SUMS, and installs lang-forge to
${PREFIX:-/usr/local}/bin. Use a user-writable directory when you do not want
sudo:
curl -fsSL https://github.com/russlank/lang-forge/releases/latest/download/install-lang-forge.sh \
| LANG_FORGE_INSTALL_DIR="$HOME/.local/bin" shSet LANG_FORGE_REPO_URL when installing from a fork or mirror that publishes
the same release asset names.
From a source checkout, you can also run the tool directly:
go run ./cmd/lang-forge versionThe core tool needs Go 1.26.4 or a compatible newer toolchain plus make.
The full example and CI suite also needs the .NET 10.0 SDK for C# examples,
GCC or another C11 compiler for C examples and Go race tests, and a C++17
compiler for C++ examples.
See Requirements for the complete toolchain matrix and target-specific notes.
The main demos exist in Go, C#, C, and C++:
| Example | Go | C# | C | C++ |
|---|---|---|---|---|
| Calculator | make -C examples/go/calc run |
make -C examples/csharp/calc run |
make -C examples/c/calc run |
make -C examples/cpp/calc run |
| DataKeeper DSL | make -C examples/go/datakeeper run |
make -C examples/csharp/datakeeper run |
make -C examples/c/datakeeper run |
make -C examples/cpp/datakeeper run |
| DRAW renderer | make -C examples/go/draw run |
make -C examples/csharp/draw run |
make -C examples/c/draw run |
make -C examples/cpp/draw run |
| Vehicle report | make -C examples/go/vehicle-report run |
make -C examples/csharp/vehicle-report run |
make -C examples/c/vehicle-report run |
make -C examples/cpp/vehicle-report run |
| Parser recovery | make -C examples/go/parser-recovery run |
make -C examples/csharp/parser-recovery run |
make -C examples/c/parser-recovery run |
make -C examples/cpp/parser-recovery run |
Example Makefiles run LangForge from source by default with
go run ../../../cmd/lang-forge. After building a standalone utility, the same
examples can use it:
make build
make -C examples/go/calc LANG_FORGE=../../../dist/lang-forge runExample Makefiles default to LANG_FORGE_VERBOSITY=1, so generation prints
major LangForge stages on stderr. Use LANG_FORGE_VERBOSITY=0 for quiet runs,
or LANG_FORGE_VERBOSITY=2/3 while debugging grammars and parser tables.
Generated example output is intentionally ignored. Use these commands to run the suite and confirm the examples return to source-only form:
make examples-test
make examples-run
make examples-cleanlinessIf you do not want to install a binary, a local Docker image can be used as the LangForge command:
make docker-build
docker run --rm -v "$PWD:/workspace:ro" -w /workspace lang-forge:dev \
validate --spec examples/go/calc/calc.lfBuild, CI, release, and Docker targets are available through the root Makefile:
make ci
make fuzz-smoke
make golden-stability
make examples-testdata
make examples-templates
make dist VERSION=0.1.0
make docker-build
make docker-smokeSee Build, pipeline, and Docker and Invocation and layout patterns for CI, release artifacts, Docker usage, Makefile patterns, and multi-parser project layouts.
- Learning path
- Requirements
- Compiler pipeline
- Glossary
- Architecture
- Tool improvement roadmap
- Build, pipeline, and Docker
- Scanner encoding architecture
- Usage
- Invocation and layout patterns
- Specification format
- Generated code and semantics
- Handwritten integration guide
- Parser algorithms
- Parser error recovery
- Examples
- Example Template Guide
- UCDT legacy inspiration
- Combined
.lfspecification parsing. - Legacy split
.lplus.yparsing for curated UCDT-derived regression fixtures. - Regex parsing, character-class partitioning, NFA-to-DFA construction, and DFA minimization.
- LR(0), SLR, LALR(1), IELR(1), and canonical LR(1) parser-table construction with conflict reporting.
- CLI commands:
version,validate,inspect, andgenerate. - Optional CLI verbosity for validation, generation, automata decisions, and parser-table traces.
- Named RHS labels, target-specific semantic type declarations, generated typed reducer contexts/adapters, and reducer coverage validation.
- Deterministic
langforge.actions.json,langforge.manifest.json, andlangforge.tables.jsonfiles. - Example parity gates comparing grammar shape and action-manifest contracts across Go, C#, C, and C++.
- Generated Go, C#, C, and C++ scanner/parser backends with UTF-8 checking, reducer hooks, semantic action IDs/enums, and token-source parsing.
- Validation for empty-matching lexer rules, token/nonterminal name collisions, parser conflicts, invalid Unicode scalar ranges, and unsupported scanner settings.
- Grammar-directed parser recovery with expected-token diagnostics and cross-target recovery APIs.
- Language-grouped examples and copyable templates for Go, C#, C, and C++.
- Additional source encodings beyond checked UTF-8.
- More debug tracing and developer-facing automata explanations.
- Optional AST helper generation.
- Additional parity checks and reusable templates as the examples mature.
- LALR(1) is the default parser algorithm. SLR, IELR(1), and canonical LR(1)
can be selected with
%type slr,%type ielr, or%type canonical. - Scanners default to checked UTF-8 and sparse Unicode scalar ranges for the in-process engine plus generated Go, C#, C, and C++ output. See Scanner encoding architecture.
- Pull-based token sources are the preferred production API. Older collection
APIs such as
Tokenize,All,Parse(tokens, ...), and target equivalents remain supported for tests, debugging, token reports, and simple examples. - Specs can use reducer callbacks with generated action IDs/enums across all targets. Go also has an advanced inline action mode for projects that need target-tagged semantic imports.
LangForge was inspired by the older Pascal UCDT project. UCDT is useful historical context and influenced the starting point, but LangForge is not trying to preserve DOS-era compatibility. The current design uses a target-neutral core, modern generated APIs, typed reducers, cross-target examples, and public documentation intended to work as compiler-learning material.
Reusable Codex skills for LangForge live under skills:
langforge-spec-authoringfor.lfand legacy.l/.ygrammar work.langforge-example-runnerfor generated example projects and demo runs.langforge-project-stewardfor reviews, hardening, and project-memory updates when private notes are present.
LangForge is released under the MIT License.