L2 OrderBook High-Frequency Execution Engine

A C++20, low-latency market-data → strategy engine skeleton built around the core HFT loop: ingest L2 updates over WSS, normalize them into a small internal representation, hand off across threads via a lock-free SPSC queue, and run a busy-wait strategy loop with microsecond-level latency visibility.

The strategy layer is intentionally modular: you can run a single alpha signal or an ensemble of multiple signals in the same process (selected at startup), combine their confidence scores, and emit mock fills/PnL to CSV for rapid iteration.

Current Status: Working end-to-end pipeline (local replay/mock tooling included) with a pluggable strategy/alpha framework. Not a production execution system yet (no real venue auth/subscribe, reconnect/ping-pong, OMS, or risk).

Features

C++20 with strict compiler enforcement (-Wall -Wextra -Wpedantic -Werror)
Performance: Release builds use -O3 -march=native for maximum speed on target hardware
SIMD JSON parsing: simdjson on-demand parsing with a fixed-capacity buffer (no per-message allocations; one memcpy into a padded buffer)
Networking (WSS): Boost.Asio + Boost.Beast over OpenSSL (TLS)
Concurrency: dedicated network thread + dedicated strategy thread, coordinated via a lock-free SPSC queue
Testing: GoogleTest integration with CMake FetchContent
Build System: Modern CMake 3.20+ with FetchContent for dependencies
Modular Signals (Alphas): Strategy runs one or more pluggable alpha signals and combines their confidence scores
Latency Visibility: CSV logs include per-tick processing latency (latency_us) so you can quantify signal overhead
Polymarket-aware ingestion: Single-asset filtering for binary markets (YES/NO legs), correct assets_ids subscribe, best_bid_ask parsing, and filtered JSONL replay

Quick Start

Prerequisites

CMake ≥ 3.20
C++20 compiler (GCC 11+, Clang 13+, MSVC 2019+)

Boost (system + thread components) — install via:

# macOS
brew install boost
# Ubuntu
sudo apt install libboost-system-dev libboost-thread-dev

OpenSSL (TLS) — install via:

# macOS
brew install openssl@3
# Ubuntu
sudo apt install libssl-dev

Git

Build & Run

# Clone and configure
git clone <your-repo-url>
cd low-latency-prediction-market-engine

# Clean previous build (important after dependency/search-path changes)
rm -rf build

# Configure (uses FetchContent for simdjson + GoogleTest)
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release

# Build (engine + tests)
cmake --build build -j$(getconf _NPROCESSORS_ONLN)

# Run the engine
# Note: with no host/port args, defaults to example.com:443 (not useful for Polymarket).
# For local dev, use mock or replay (below) and point at 127.0.0.1:8765.

# Strategy selection (default: both). On real/replay Polymarket data, ofi often fires more than momentum.
./build/engine --strategy momentum
./build/engine --strategy ofi
./build/engine --strategy both

# Polymarket asset (token) filter — default matches tools/polymarket_config.py / historical_data.jsonl
./build/engine --strategy ofi --asset-id 96351650250139397447438653380483970772060142397849794315678720298272472897874 127.0.0.1 8765 /

# Run tests
ctest --test-dir build -V

Deterministic local run (recommended)

Generate a local TLS cert/key (used by the Python WSS servers):

./tools/gen_self_signed_cert.sh

In one terminal, start a local websocket server:

source .venv/bin/activate
python tools/mock_wss_server.py
# or
python tools/replay_server.py

In another terminal, point the engine at it:

./build/engine --strategy ofi 127.0.0.1 8765 /

If port 8765 is already in use, stop the old server (lsof -nP -iTCP:8765 -sTCP:LISTEN, then kill <pid>) or change the port in tools/replay_server.py and match it in the engine args.

Polymarket replay: why trades were zero before (and what we fixed)

Replay does not use the websocket subscribe message (the replay server ignores it). Trades failed on real JSONL for three separate reasons:

Dual-leg price_change messages — Polymarket sends YES and NO token updates in one payload. The engine keeps a single order book, so applying both legs mixed two unrelated prices (e.g. 60¢ and 40¢) into one book. Alphas then saw nonsense spreads and rarely crossed the trade threshold.
Replay noise — Most lines in historical_data.jsonl are new_market broadcasts (~80% of rows), not L2 updates. Replaying them unfiltered wasted time and never improved the book.
Price parsing edge cases — Top-of-book fields like "best_ask":"1" mean $1.00 (100¢); the parser used to treat "1" as 1¢, breaking the spread.

Fixes (already in the repo):

Layer	Change
`tools/polymarket_config.py`	Default YES token + market IDs derived from `historical_data.jsonl`
`tools/replay_server.py` + `tools/replay_filter.py`	Replay only `book` / `price_change` / `best_bid_ask` for one `asset_id`; skip `new_market`
`include/market_parser.hpp`	`set_asset_filter()`; parse `best_bid_ask`; fix `"1"` → 100¢
`src/websocket_client.cpp`	Polymarket subscribe uses `assets_ids` (for live, not replay)
`src/main.cpp`	`--asset-id` (defaults to `engine_config::kDefaultPolymarketAssetId`)

Mock PnL often stays at 0 even when T rows appear — that is not “no arbitrage in the feed.” The engine uses simulated market orders at the current best bid/ask when |alpha score| ≥ 0.6. On the bundled recording the market sits near 99.9¢ / $1.00, so buys and sells hit almost the same price and round-trip fills lock in ~zero realized PnL. You are seeing signal + mock execution fire, not a profitable strategy.

Recommended Environment Setup (Conda + Python tooling)

This repo has two independent “worlds”:

C++ build/run (CMake + Clang/GCC + system/Homebrew libs)
Python tooling (used for local websocket tooling in tools/: recorder, replay server, mock server, dashboard)

To keep the C++ toolchain deterministic on macOS, it’s recommended to deactivate Conda base in the terminal you use for building/running C++:

conda deactivate

For the Python mock server, use a repo-local virtualenv instead of installing packages into Conda base:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip websockets certifi

Practical workflow: run the Python server in one terminal (with (.venv) active), and build/run the engine in another terminal (with neither (.venv) nor (base) active).

Recording real Polymarket data (JSONL)

The recorder in tools/record_polymarket.py connects to Polymarket’s public Market Channel websocket and appends every incoming JSON message to historical_data.jsonl.

Ensure the repo-local venv has websockets installed:

source .venv/bin/activate
python -m pip install -U pip websockets certifi

Set the token to record in tools/polymarket_config.py (DEFAULT_ASSET_ID) or in tools/record_polymarket.py (ASSET_IDS imports that default). These are Polymarket asset IDs / token IDs (one leg of a binary market), not a human ticker symbol.
Run the recorder:

source .venv/bin/activate
python tools/record_polymarket.py

Replaying recorded Polymarket data (local TLS websocket)

The replay server in tools/replay_server.py serves historical_data.jsonl over a local TLS websocket and replays messages in order.

It prefers replaying the original on-the-wire websocket payload when available (raw_message / raw fields), uses local_timestamp_ns to reproduce short-term burstiness, and filters each line to a single asset (default: DEFAULT_ASSET_ID in tools/polymarket_config.py) so YES/NO legs are not merged into one book.

Override the filter:

REPLAY_ASSET_IDS=96351650250139397447438653380483970772060142397849794315678720298272472897874 python tools/replay_server.py

Ensure the repo-local venv has dependencies installed:

source .venv/bin/activate
python -m pip install -U pip websockets

Ensure you have a local TLS cert/key for the server:

./tools/gen_self_signed_cert.sh

This generates tools/cert.pem and tools/key.pem (used by the replay server).

Run the replay server:

source .venv/bin/activate
python tools/replay_server.py

Optional: cap inter-message sleep time to avoid multi-minute gaps if your JSONL contains time discontinuities (default is 0.5 seconds):

REPLAY_MAX_SLEEP_S=0.1 python tools/replay_server.py

Point the engine at the replay server (use the same asset id as replay filter):

./build/engine --strategy ofi 127.0.0.1 8765 /
# optional explicit asset (default matches polymarket_config.py):
# ./build/engine --strategy ofi --asset-id <token_id> 127.0.0.1 8765 /

At this point trading_log.csv should accumulate P (mark-to-market) and some T (mock trade) rows. Realized PnL may remain 0.0 on this dataset for the reasons above; use tools/mock_wss_server.py if you want exaggerated PnL swings for dashboard demos.

The Streamlit dashboard should stop showing the “headers only” warning once data is flowing.

Real-time Dashboard (Streamlit)

The Streamlit dashboard reads a CSV file named trading_log.csv (by default).

Required columns (used for plots/metrics):

timestamp_us,event_type,price,size,realized_pnl

Additional columns written by the engine:

latency_us: tick-to-log processing latency in microseconds (captures alpha + decision overhead)
strategy: strategy name for metadata rows

Event types:

T: trade event
P: periodic mark-to-market point
M: metadata (e.g., active strategy name)

Install dashboard deps (recommended inside the repo-local (.venv)):

source .venv/bin/activate
python -m pip install -U streamlit pandas plotly streamlit-autorefresh

Run the dashboard:

streamlit run tools/dashboard.py

Point it at a different log file (optional):

TRADING_LOG_PATH=/path/to/trading_log.csv streamlit run tools/dashboard.py

Note: the dashboard will show a warning until trading_log.csv exists and has data.

The sidebar also shows the Active Strategy (parsed from metadata rows written at engine startup).

Optional (quick sanity check): generate a tiny sample log file:

python - <<'PY'
import csv
import time

rows = [
  (0, 'P', 0.0, 0, 0.0),
  (500_000, 'T', 0.59, 10, 0.0),
  (1_000_000, 'P', 0.0, 0, 0.12),
  (1_500_000, 'T', 0.57, -10, 0.12),
  (2_000_000, 'P', 0.0, 0, 0.20),
]

with open('trading_log.csv', 'w', newline='') as f:
  w = csv.writer(f)
  w.writerow(['timestamp_us', 'event_type', 'price', 'size', 'realized_pnl'])
  base = int(time.time() * 1_000_000)
  for t, et, price, size, pnl in rows:
    w.writerow([base + t, et, price, size, pnl])
print('wrote trading_log.csv')
PY

Note for macOS users in conda (base): conda often injects search paths that can cause a mixed Boost install to be detected (e.g., Homebrew BoostConfig + conda boost_system). The build defaults to ignoring CONDA_PREFIX during dependency discovery; override with:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLPME_IGNORE_CONDA_PREFIX=OFF

Expected output: the engine is mostly quiet when healthy. You should see trading_log.csv being written and the dashboard updating; on Ctrl+C you should see Shutdown complete.

Project Structure

.
├── CMakeLists.txt          # Root build configuration (strict flags, deps)
├── include/                # Public API headers (orderbook, engine interfaces)
├── src/
│   ├── main.cpp            # Entry point / wiring / shutdown
│   ├── strategy_engine.cpp # Strategy thread loop
│   └── websocket_client.cpp # Boost.Beast WSS client
├── tests/
│   └── test_main.cpp       # GoogleTest suite
├── tools/
│   ├── polymarket_config.py  # Default asset/market IDs (shared by recorder + replay)
│   ├── replay_filter.py      # Single-asset JSONL replay filtering
│   ├── mock_wss_server.py    # Local TLS websocket tick generator
│   ├── record_polymarket.py  # Record Polymarket Market Channel JSONL
│   ├── replay_server.py      # Replay historical_data.jsonl over local WSS
│   └── dashboard.py          # Streamlit dashboard for trading_log.csv
├── .gitignore
├── README.md
└── build/                  # Generated (ignored)

Development

Adding Components

Place headers in include/
Implementation in src/
Tests in tests/
Update CMakeLists.txt targets as modules grow (consider add_library for core engine)

Testing

# After building (see Build & Run)
ctest --test-dir build -V

Performance Tuning

Profile with perf, Valgrind, or Tracy
Ensure -march=native matches production hardware
Monitor cache misses and branch prediction in hot paths (order matching, risk checks)

Dependencies (Managed by CMake)

simdjson (v3.6.3): Zero-allocation JSON parsing
GoogleTest (v1.15.2): Unit testing
Boost: system + thread (required)
OpenSSL: TLS for wss://

GitHub Setup

This repository includes:

Comprehensive .gitignore for C++/CMake/IDE artifacts
GitHub Actions CI (see .github/workflows/ci.yml) for automated builds/tests across platforms
Modern CMake with dependency management and macOS/conda-friendly Boost discovery

CI/CD Recommendations

Add GitHub Actions workflow for:
- Linux (Ubuntu) + macOS matrix
- Release builds with sanitizers (-fsanitize=address,undefined)
- Benchmarking and performance regression tests
Use clang-tidy and include-what-you-use for static analysis
Consider conan or vcpkg for full dependency management in larger projects

Roadmap

More realistic order book model (multi-asset routing, sequencing/consistency checks)
Polymarket Market Channel subscribe (assets_ids) + single-asset parser/replay filter
Live reconnect + application-level PING/PONG + TLS verification policy
Reconnect/backoff and ping/pong in the C++ websocket client
Matching engine with low-latency priority queues
Risk management and position tracking
Benchmark suite (latency histograms, throughput)
Python bindings (via pybind11) for research

License

Contributing

Not currently accepting external contributions.

Built for high-frequency prediction market execution. Questions? Open an issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

L2 OrderBook High-Frequency Execution Engine

Features

Quick Start

Prerequisites

Build & Run

Deterministic local run (recommended)

Polymarket replay: why trades were zero before (and what we fixed)

Recommended Environment Setup (Conda + Python tooling)

Recording real Polymarket data (JSONL)

Replaying recorded Polymarket data (local TLS websocket)

Real-time Dashboard (Streamlit)

Project Structure

Development

Adding Components

Testing

Performance Tuning

Dependencies (Managed by CMake)

GitHub Setup

CI/CD Recommendations

Roadmap

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
include		include
src		src
tests		tests
tools		tools
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

L2 OrderBook High-Frequency Execution Engine

Features

Quick Start

Prerequisites

Build & Run

Deterministic local run (recommended)

Polymarket replay: why trades were zero before (and what we fixed)

Recommended Environment Setup (Conda + Python tooling)

Recording real Polymarket data (JSONL)

Replaying recorded Polymarket data (local TLS websocket)

Real-time Dashboard (Streamlit)

Project Structure

Development

Adding Components

Testing

Performance Tuning

Dependencies (Managed by CMake)

GitHub Setup

CI/CD Recommendations

Roadmap

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages