Skip to content

G-Wang12/L2-OrderBook-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

L2 OrderBook High-Frequency Execution Engine

A C++20, low-latency market-data → strategy engine skeleton built around the core HFT loop: ingest L2 updates over WSS, normalize them into a small internal representation, hand off across threads via a lock-free SPSC queue, and run a busy-wait strategy loop with microsecond-level latency visibility.

The strategy layer is intentionally modular: you can run a single alpha signal or an ensemble of multiple signals in the same process (selected at startup), combine their confidence scores, and emit mock fills/PnL to CSV for rapid iteration.

Current Status: Working end-to-end pipeline (local replay/mock tooling included) with a pluggable strategy/alpha framework. Not a production execution system yet (no real venue auth/subscribe, reconnect/ping-pong, OMS, or risk).

Features

  • C++20 with strict compiler enforcement (-Wall -Wextra -Wpedantic -Werror)
  • Performance: Release builds use -O3 -march=native for maximum speed on target hardware
  • SIMD JSON parsing: simdjson on-demand parsing with a fixed-capacity buffer (no per-message allocations; one memcpy into a padded buffer)
  • Networking (WSS): Boost.Asio + Boost.Beast over OpenSSL (TLS)
  • Concurrency: dedicated network thread + dedicated strategy thread, coordinated via a lock-free SPSC queue
  • Testing: GoogleTest integration with CMake FetchContent
  • Build System: Modern CMake 3.20+ with FetchContent for dependencies
  • Modular Signals (Alphas): Strategy runs one or more pluggable alpha signals and combines their confidence scores
  • Latency Visibility: CSV logs include per-tick processing latency (latency_us) so you can quantify signal overhead
  • Polymarket-aware ingestion: Single-asset filtering for binary markets (YES/NO legs), correct assets_ids subscribe, best_bid_ask parsing, and filtered JSONL replay

Quick Start

Prerequisites

  • CMake ≥ 3.20
  • C++20 compiler (GCC 11+, Clang 13+, MSVC 2019+)
  • Boost (system + thread components) — install via:
    # macOS
    brew install boost
    # Ubuntu
    sudo apt install libboost-system-dev libboost-thread-dev
  • OpenSSL (TLS) — install via:
    # macOS
    brew install openssl@3
    # Ubuntu
    sudo apt install libssl-dev
  • Git

Build & Run

# Clone and configure
git clone <your-repo-url>
cd low-latency-prediction-market-engine

# Clean previous build (important after dependency/search-path changes)
rm -rf build

# Configure (uses FetchContent for simdjson + GoogleTest)
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release

# Build (engine + tests)
cmake --build build -j$(getconf _NPROCESSORS_ONLN)

# Run the engine
# Note: with no host/port args, defaults to example.com:443 (not useful for Polymarket).
# For local dev, use mock or replay (below) and point at 127.0.0.1:8765.

# Strategy selection (default: both). On real/replay Polymarket data, ofi often fires more than momentum.
./build/engine --strategy momentum
./build/engine --strategy ofi
./build/engine --strategy both

# Polymarket asset (token) filter — default matches tools/polymarket_config.py / historical_data.jsonl
./build/engine --strategy ofi --asset-id 96351650250139397447438653380483970772060142397849794315678720298272472897874 127.0.0.1 8765 /

# Run tests
ctest --test-dir build -V

Deterministic local run (recommended)

  1. Generate a local TLS cert/key (used by the Python WSS servers):
./tools/gen_self_signed_cert.sh
  1. In one terminal, start a local websocket server:
source .venv/bin/activate
python tools/mock_wss_server.py
# or
python tools/replay_server.py
  1. In another terminal, point the engine at it:
./build/engine --strategy ofi 127.0.0.1 8765 /

If port 8765 is already in use, stop the old server (lsof -nP -iTCP:8765 -sTCP:LISTEN, then kill <pid>) or change the port in tools/replay_server.py and match it in the engine args.

Polymarket replay: why trades were zero before (and what we fixed)

Replay does not use the websocket subscribe message (the replay server ignores it). Trades failed on real JSONL for three separate reasons:

  1. Dual-leg price_change messages — Polymarket sends YES and NO token updates in one payload. The engine keeps a single order book, so applying both legs mixed two unrelated prices (e.g. 60¢ and 40¢) into one book. Alphas then saw nonsense spreads and rarely crossed the trade threshold.
  2. Replay noise — Most lines in historical_data.jsonl are new_market broadcasts (~80% of rows), not L2 updates. Replaying them unfiltered wasted time and never improved the book.
  3. Price parsing edge cases — Top-of-book fields like "best_ask":"1" mean $1.00 (100¢); the parser used to treat "1" as 1¢, breaking the spread.

Fixes (already in the repo):

Layer Change
tools/polymarket_config.py Default YES token + market IDs derived from historical_data.jsonl
tools/replay_server.py + tools/replay_filter.py Replay only book / price_change / best_bid_ask for one asset_id; skip new_market
include/market_parser.hpp set_asset_filter(); parse best_bid_ask; fix "1" → 100¢
src/websocket_client.cpp Polymarket subscribe uses assets_ids (for live, not replay)
src/main.cpp --asset-id (defaults to engine_config::kDefaultPolymarketAssetId)

Mock PnL often stays at 0 even when T rows appear — that is not “no arbitrage in the feed.” The engine uses simulated market orders at the current best bid/ask when |alpha score| ≥ 0.6. On the bundled recording the market sits near 99.9¢ / $1.00, so buys and sells hit almost the same price and round-trip fills lock in ~zero realized PnL. You are seeing signal + mock execution fire, not a profitable strategy.

Recommended Environment Setup (Conda + Python tooling)

This repo has two independent “worlds”:

  • C++ build/run (CMake + Clang/GCC + system/Homebrew libs)
  • Python tooling (used for local websocket tooling in tools/: recorder, replay server, mock server, dashboard)

To keep the C++ toolchain deterministic on macOS, it’s recommended to deactivate Conda base in the terminal you use for building/running C++:

conda deactivate

For the Python mock server, use a repo-local virtualenv instead of installing packages into Conda base:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip websockets certifi

Practical workflow: run the Python server in one terminal (with (.venv) active), and build/run the engine in another terminal (with neither (.venv) nor (base) active).

Recording real Polymarket data (JSONL)

The recorder in tools/record_polymarket.py connects to Polymarket’s public Market Channel websocket and appends every incoming JSON message to historical_data.jsonl.

  1. Ensure the repo-local venv has websockets installed:
source .venv/bin/activate
python -m pip install -U pip websockets certifi
  1. Set the token to record in tools/polymarket_config.py (DEFAULT_ASSET_ID) or in tools/record_polymarket.py (ASSET_IDS imports that default). These are Polymarket asset IDs / token IDs (one leg of a binary market), not a human ticker symbol.

  2. Run the recorder:

source .venv/bin/activate
python tools/record_polymarket.py

Replaying recorded Polymarket data (local TLS websocket)

The replay server in tools/replay_server.py serves historical_data.jsonl over a local TLS websocket and replays messages in order.

It prefers replaying the original on-the-wire websocket payload when available (raw_message / raw fields), uses local_timestamp_ns to reproduce short-term burstiness, and filters each line to a single asset (default: DEFAULT_ASSET_ID in tools/polymarket_config.py) so YES/NO legs are not merged into one book.

Override the filter:

REPLAY_ASSET_IDS=96351650250139397447438653380483970772060142397849794315678720298272472897874 python tools/replay_server.py
  1. Ensure the repo-local venv has dependencies installed:
source .venv/bin/activate
python -m pip install -U pip websockets
  1. Ensure you have a local TLS cert/key for the server:
./tools/gen_self_signed_cert.sh

This generates tools/cert.pem and tools/key.pem (used by the replay server).

  1. Run the replay server:
source .venv/bin/activate
python tools/replay_server.py

Optional: cap inter-message sleep time to avoid multi-minute gaps if your JSONL contains time discontinuities (default is 0.5 seconds):

REPLAY_MAX_SLEEP_S=0.1 python tools/replay_server.py
  1. Point the engine at the replay server (use the same asset id as replay filter):
./build/engine --strategy ofi 127.0.0.1 8765 /
# optional explicit asset (default matches polymarket_config.py):
# ./build/engine --strategy ofi --asset-id <token_id> 127.0.0.1 8765 /

At this point trading_log.csv should accumulate P (mark-to-market) and some T (mock trade) rows. Realized PnL may remain 0.0 on this dataset for the reasons above; use tools/mock_wss_server.py if you want exaggerated PnL swings for dashboard demos.

The Streamlit dashboard should stop showing the “headers only” warning once data is flowing.

Real-time Dashboard (Streamlit)

The Streamlit dashboard reads a CSV file named trading_log.csv (by default).

Required columns (used for plots/metrics):

timestamp_us,event_type,price,size,realized_pnl

Additional columns written by the engine:

  • latency_us: tick-to-log processing latency in microseconds (captures alpha + decision overhead)
  • strategy: strategy name for metadata rows

Event types:

  • T: trade event
  • P: periodic mark-to-market point
  • M: metadata (e.g., active strategy name)
  1. Install dashboard deps (recommended inside the repo-local (.venv)):
source .venv/bin/activate
python -m pip install -U streamlit pandas plotly streamlit-autorefresh
  1. Run the dashboard:
streamlit run tools/dashboard.py
  1. Point it at a different log file (optional):
TRADING_LOG_PATH=/path/to/trading_log.csv streamlit run tools/dashboard.py

Note: the dashboard will show a warning until trading_log.csv exists and has data.

The sidebar also shows the Active Strategy (parsed from metadata rows written at engine startup).

Optional (quick sanity check): generate a tiny sample log file:

python - <<'PY'
import csv
import time

rows = [
  (0, 'P', 0.0, 0, 0.0),
  (500_000, 'T', 0.59, 10, 0.0),
  (1_000_000, 'P', 0.0, 0, 0.12),
  (1_500_000, 'T', 0.57, -10, 0.12),
  (2_000_000, 'P', 0.0, 0, 0.20),
]

with open('trading_log.csv', 'w', newline='') as f:
  w = csv.writer(f)
  w.writerow(['timestamp_us', 'event_type', 'price', 'size', 'realized_pnl'])
  base = int(time.time() * 1_000_000)
  for t, et, price, size, pnl in rows:
    w.writerow([base + t, et, price, size, pnl])
print('wrote trading_log.csv')
PY

Note for macOS users in conda (base): conda often injects search paths that can cause a mixed Boost install to be detected (e.g., Homebrew BoostConfig + conda boost_system). The build defaults to ignoring CONDA_PREFIX during dependency discovery; override with:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLPME_IGNORE_CONDA_PREFIX=OFF

Expected output: the engine is mostly quiet when healthy. You should see trading_log.csv being written and the dashboard updating; on Ctrl+C you should see Shutdown complete.

Project Structure

.
├── CMakeLists.txt          # Root build configuration (strict flags, deps)
├── include/                # Public API headers (orderbook, engine interfaces)
├── src/
│   ├── main.cpp            # Entry point / wiring / shutdown
│   ├── strategy_engine.cpp # Strategy thread loop
│   └── websocket_client.cpp # Boost.Beast WSS client
├── tests/
│   └── test_main.cpp       # GoogleTest suite
├── tools/
│   ├── polymarket_config.py  # Default asset/market IDs (shared by recorder + replay)
│   ├── replay_filter.py      # Single-asset JSONL replay filtering
│   ├── mock_wss_server.py    # Local TLS websocket tick generator
│   ├── record_polymarket.py  # Record Polymarket Market Channel JSONL
│   ├── replay_server.py      # Replay historical_data.jsonl over local WSS
│   └── dashboard.py          # Streamlit dashboard for trading_log.csv
├── .gitignore
├── README.md
└── build/                  # Generated (ignored)

Development

Adding Components

  1. Place headers in include/
  2. Implementation in src/
  3. Tests in tests/
  4. Update CMakeLists.txt targets as modules grow (consider add_library for core engine)

Testing

# After building (see Build & Run)
ctest --test-dir build -V

Performance Tuning

  • Profile with perf, Valgrind, or Tracy
  • Ensure -march=native matches production hardware
  • Monitor cache misses and branch prediction in hot paths (order matching, risk checks)

Dependencies (Managed by CMake)

  • simdjson (v3.6.3): Zero-allocation JSON parsing
  • GoogleTest (v1.15.2): Unit testing
  • Boost: system + thread (required)
  • OpenSSL: TLS for wss://

GitHub Setup

This repository includes:

  • Comprehensive .gitignore for C++/CMake/IDE artifacts
  • GitHub Actions CI (see .github/workflows/ci.yml) for automated builds/tests across platforms
  • Modern CMake with dependency management and macOS/conda-friendly Boost discovery

CI/CD Recommendations

  • Add GitHub Actions workflow for:
    • Linux (Ubuntu) + macOS matrix
    • Release builds with sanitizers (-fsanitize=address,undefined)
    • Benchmarking and performance regression tests
  • Use clang-tidy and include-what-you-use for static analysis
  • Consider conan or vcpkg for full dependency management in larger projects

Roadmap

  • More realistic order book model (multi-asset routing, sequencing/consistency checks)
  • Polymarket Market Channel subscribe (assets_ids) + single-asset parser/replay filter
  • Live reconnect + application-level PING/PONG + TLS verification policy
  • Reconnect/backoff and ping/pong in the C++ websocket client
  • Matching engine with low-latency priority queues
  • Risk management and position tracking
  • Benchmark suite (latency histograms, throughput)
  • Python bindings (via pybind11) for research

License

This project is currently unlicensed (all rights reserved).

Contributing

Not currently accepting external contributions.


Built for high-frequency prediction market execution. Questions? Open an issue.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors