LBG-Agent

A heuristic learning agent for trading, inspired by Jiayi Weng's Learning Beyond Gradients.

LBG-Agent uses an LLM as a strategy editor under a strict sealed-evaluation protocol. It proposes incremental edits to a SPY daily strategy through a deterministic Python control plane (the Orchestrator), which decides each trial's fate by running invariants, backtests, and a multi-objective gate. No LLM ever touches the sealed test window.

Documentation

PROPOSAL.html — source-of-truth specification for architecture, invariants, and verdict criteria
docs/STAGE1_REPORT.html — Stage 1 实施回顾（中文图文）
CLAUDE.md — guidance for Claude Code sessions

What is this

The framework runs a multi-trial Discovery loop. Each trial:

The Editor (LLM) proposes one structured edit to strategy.yaml or indicators/, with an explicit hypothesis and an expected outcome.
The Orchestrator validates the proposal against a five-layer invariant stack (Pydantic schema, AST static checks, restricted-namespace sandbox, prefix_stability dynamic check, gate-side runtime checks).
Two backtests run (train + validation). The ValidationGate decides accept or reject from a small set of categorical signals.
The Reflector (LLM) explains the outcome and updates four semantic memory documents. It can never override the mechanically computed hypothesis outcome — that's the project's main defense against sycophancy.
Every ten accepted trials, the Curator (LLM, shadow mode) compresses the four memory documents.
After the configured budget, the sealed test window opens exactly once and the H1 verdict is computed via a moving block bootstrap.

The three LLM agents

Agent	When	Input	Output
Editor	Start of every trial	current `strategy.yaml`, recent trial summaries (categorical val signal only), four semantic memory `.md` files, active skills	one YAML proposal: edit type + hypothesis + expected train/val signals + fallback
Reflector	After the gate decides	proposal, mechanical `hypothesis_outcome`, actual `validation_signal`, train metrics	mechanical explanation + incremental bullet updates to the four `.md` files
Curator	Every 10 accepted trials (shadow mode)	the four `.md` files in full	compressed, deduplicated rewrites of the four `.md` files

Repository layout

backtest.py               # vectorized SPY daily backtest (~150 LOC)
policy_interpreter.py     # deterministic policy engine
strategy.yaml             # initial DSL config (sma_cross_baseline)

indicators/               # agent-authored pure-function indicators
lbg/
  builder/                # CandidateBuilder + 8 edit type appliers
  data/                   # yfinance + Tiingo cross-check, split_A/B/C aliasing
  dsl/                    # Pydantic schemas for strategy.yaml
  gate/                   # ValidationGate, HypothesisScorer, complexity
  git_manager.py          # one commit per trial, one branch per Curator cycle
  invariants/             # AST checks + prefix_stability
  knowledge/              # external factor knowledge base (read-only)
  memory/                 # MemoryManager (jsonl + md)
  orchestrator/           # ContextBuilder, RoleRunner, Curator, Discovery, prompts/
  parser/                 # ProposalParser, per-edit-type payloads
  sandbox/                # restricted-namespace exec + SIGALRM timeout
  schemas.py              # EditProposal, TrialRecord, 6 StrEnums
  sealed_vault.py         # write-once container for sealed_test_final.json
  skills/                 # SkillManager (skill_id.yaml store)
  stage2/                 # forward validation engine
  stage3/                 # PaperTradingEngine (bar-by-bar streaming)
  verdict/                # H1 moving block bootstrap + analysis_plan

scripts/
  long_discovery.py       # CLI entry for a multi-trial Discovery run
  migrate_factors.py      # one-shot import of the factor knowledge base
  reshuffle_proposal.py

tests/                    # 330+ tests, pytest-driven
docs/                     # human-facing reports
artifacts/                # runtime: sealed/, reports/  (gitignored)
memory/                   # runtime: per-run jsonl + md  (auditable)
runs/                     # runtime: per-trial editor.yaml + reflector.yaml

Quick start

The project uses uv for dependencies and ruff for lint + format.

# 1. install
uv sync

# 2. configure secrets (.env is gitignored)
cat > .env <<'EOF'
TIINGO_TOKEN=<your tiingo token>          # for data cross-check
ANTHROPIC_API_KEY=<your anthropic key>    # default LLM provider
MIMO_API_KEY=<your mimo key>              # optional: MIMO provider
EOF

# 3. fetch SPY data (one-time; persists to data/spy_daily.parquet)
uv run python -m lbg.data.loader fetch

# 4. run a short Discovery (writes to /tmp/lbg_run/, doesn't touch this repo)
uv run python scripts/long_discovery.py --budget 5 --out /tmp/lbg_run

# 5. inspect the artifacts
open /tmp/lbg_run/artifacts/reports/discovery_report.html
cat /tmp/lbg_run/artifacts/sealed/sealed_test_final.json

Switching LLM providers

Two providers are registered out of the box: Anthropic (default) and MIMO (Anthropic-compatible REST endpoint).

LBG_PROVIDER=mimo uv run python scripts/long_discovery.py --budget 5 --out /tmp/lbg_mimo

Or in code:

from lbg.orchestrator import RoleRunner
runner = RoleRunner(provider="mimo")          # or provider="anthropic"

The Orchestrator is provider-agnostic. Both providers produce bit-identical sealed verdicts on the same baseline strategy, by design.

Testing

uv run pytest -q                # 330+ tests; skips live LLM tests if keys absent
uv run ruff check               # lint
uv run ruff format --check      # format

Live LLM tests (test_editor_live_*, test_reflector_live_one_call, test_curator_live_one_call, test_editor_live_via_mimo_provider) are skipped automatically when ANTHROPIC_API_KEY / MIMO_API_KEY are not set. They make real API calls and cost a small amount per run.

Stage 2 + Stage 3

After a frozen strategy artifact is produced by Stage 1, the same backtest engine drives two downstream stages, neither of which calls an LLM:

Stage 2 — forward validation: lbg.stage2.run_forward_validation runs a deterministic backtest on post-sealed bars with a loose go/no-go gate (min trades, max drawdown floor, min Sharpe).
Stage 3 — paper trading: lbg.stage3.PaperTradingEngine streams daily bars one at a time, recomputes positions on the growing history buffer, and logs realized PnL with the same cost model as the backtest.

Stage 4 (live capital) is intentionally out of scope.

Project status

Stage	Status
Stage 1 — Discovery loop (LLM-driven)	done, 13 Orchestrator submodules, all 8 edit types
Stage 2 — forward validation (no LLM)	done
Stage 3 — paper trading (no LLM)	done, simulation only
Stage 4 — live capital broker adapter	out of scope
Active Curator mode	not enabled (shadow only)

Experimental result on the SMA(20/50) baseline (both providers, budget=20): H1 strong = False, H1 weak = False. The strategy underperforms buy-and-hold on the sealed window by 0.75 Sharpe units; the framework reported this cleanly. See docs/STAGE1_REPORT.html § 7 – § 8.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LBG-Agent

What is this

The three LLM agents

Repository layout

Quick start

Switching LLM providers

Testing

Stage 2 + Stage 3

Project status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.agents/skills/new-indicator		.agents/skills/new-indicator
.claude		.claude
.codex		.codex
docs		docs
indicators		indicators
lbg		lbg
memory		memory
scripts		scripts
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
PROPOSAL.html		PROPOSAL.html
PROPOSAL_ZH.html		PROPOSAL_ZH.html
README.md		README.md
backtest.py		backtest.py
policy_interpreter.py		policy_interpreter.py
pyproject.toml		pyproject.toml
strategy.yaml		strategy.yaml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

LBG-Agent

What is this

The three LLM agents

Repository layout

Quick start

Switching LLM providers

Testing

Stage 2 + Stage 3

Project status

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages