A C++20, low-latency market-data → strategy engine skeleton built around the core HFT loop: ingest L2 updates over WSS, normalize them into a small internal representation, hand off across threads via a lock-free SPSC queue, and run a busy-wait strategy loop with microsecond-level latency visibility.
The strategy layer is intentionally modular: you can run a single alpha signal or an ensemble of multiple signals in the same process (selected at startup), combine their confidence scores, and emit mock fills/PnL to CSV for rapid iteration.
Current Status: Working end-to-end pipeline (local replay/mock tooling included) with a pluggable strategy/alpha framework. Not a production execution system yet (no real venue auth/subscribe, reconnect/ping-pong, OMS, or risk).
- C++20 with strict compiler enforcement (
-Wall -Wextra -Wpedantic -Werror) - Performance: Release builds use
-O3 -march=nativefor maximum speed on target hardware - SIMD JSON parsing: simdjson on-demand parsing with a fixed-capacity buffer (no per-message allocations; one memcpy into a padded buffer)
- Networking (WSS): Boost.Asio + Boost.Beast over OpenSSL (TLS)
- Concurrency: dedicated network thread + dedicated strategy thread, coordinated via a lock-free SPSC queue
- Testing: GoogleTest integration with CMake FetchContent
- Build System: Modern CMake 3.20+ with FetchContent for dependencies
- Modular Signals (Alphas): Strategy runs one or more pluggable alpha signals and combines their confidence scores
- Latency Visibility: CSV logs include per-tick processing latency (
latency_us) so you can quantify signal overhead - Polymarket-aware ingestion: Single-asset filtering for binary markets (YES/NO legs), correct
assets_idssubscribe,best_bid_askparsing, and filtered JSONL replay
- CMake ≥ 3.20
- C++20 compiler (GCC 11+, Clang 13+, MSVC 2019+)
- Boost (system + thread components) — install via:
# macOS brew install boost # Ubuntu sudo apt install libboost-system-dev libboost-thread-dev
- OpenSSL (TLS) — install via:
# macOS brew install openssl@3 # Ubuntu sudo apt install libssl-dev
- Git
# Clone and configure
git clone <your-repo-url>
cd low-latency-prediction-market-engine
# Clean previous build (important after dependency/search-path changes)
rm -rf build
# Configure (uses FetchContent for simdjson + GoogleTest)
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release
# Build (engine + tests)
cmake --build build -j$(getconf _NPROCESSORS_ONLN)
# Run the engine
# Note: with no host/port args, defaults to example.com:443 (not useful for Polymarket).
# For local dev, use mock or replay (below) and point at 127.0.0.1:8765.
# Strategy selection (default: both). On real/replay Polymarket data, ofi often fires more than momentum.
./build/engine --strategy momentum
./build/engine --strategy ofi
./build/engine --strategy both
# Polymarket asset (token) filter — default matches tools/polymarket_config.py / historical_data.jsonl
./build/engine --strategy ofi --asset-id 96351650250139397447438653380483970772060142397849794315678720298272472897874 127.0.0.1 8765 /
# Run tests
ctest --test-dir build -V- Generate a local TLS cert/key (used by the Python WSS servers):
./tools/gen_self_signed_cert.sh- In one terminal, start a local websocket server:
source .venv/bin/activate
python tools/mock_wss_server.py
# or
python tools/replay_server.py- In another terminal, point the engine at it:
./build/engine --strategy ofi 127.0.0.1 8765 /If port 8765 is already in use, stop the old server (lsof -nP -iTCP:8765 -sTCP:LISTEN, then kill <pid>) or change the port in tools/replay_server.py and match it in the engine args.
Replay does not use the websocket subscribe message (the replay server ignores it). Trades failed on real JSONL for three separate reasons:
- Dual-leg
price_changemessages — Polymarket sends YES and NO token updates in one payload. The engine keeps a single order book, so applying both legs mixed two unrelated prices (e.g. 60¢ and 40¢) into one book. Alphas then saw nonsense spreads and rarely crossed the trade threshold. - Replay noise — Most lines in
historical_data.jsonlarenew_marketbroadcasts (~80% of rows), not L2 updates. Replaying them unfiltered wasted time and never improved the book. - Price parsing edge cases — Top-of-book fields like
"best_ask":"1"mean $1.00 (100¢); the parser used to treat"1"as 1¢, breaking the spread.
Fixes (already in the repo):
| Layer | Change |
|---|---|
tools/polymarket_config.py |
Default YES token + market IDs derived from historical_data.jsonl |
tools/replay_server.py + tools/replay_filter.py |
Replay only book / price_change / best_bid_ask for one asset_id; skip new_market |
include/market_parser.hpp |
set_asset_filter(); parse best_bid_ask; fix "1" → 100¢ |
src/websocket_client.cpp |
Polymarket subscribe uses assets_ids (for live, not replay) |
src/main.cpp |
--asset-id (defaults to engine_config::kDefaultPolymarketAssetId) |
Mock PnL often stays at 0 even when T rows appear — that is not “no arbitrage in the feed.” The engine uses simulated market orders at the current best bid/ask when |alpha score| ≥ 0.6. On the bundled recording the market sits near 99.9¢ / $1.00, so buys and sells hit almost the same price and round-trip fills lock in ~zero realized PnL. You are seeing signal + mock execution fire, not a profitable strategy.
This repo has two independent “worlds”:
- C++ build/run (CMake + Clang/GCC + system/Homebrew libs)
- Python tooling (used for local websocket tooling in
tools/: recorder, replay server, mock server, dashboard)
To keep the C++ toolchain deterministic on macOS, it’s recommended to deactivate Conda base in the terminal you use for building/running C++:
conda deactivateFor the Python mock server, use a repo-local virtualenv instead of installing packages into Conda base:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip websockets certifiPractical workflow: run the Python server in one terminal (with (.venv) active), and build/run the engine in another terminal (with neither (.venv) nor (base) active).
The recorder in tools/record_polymarket.py connects to Polymarket’s public Market Channel websocket and appends every incoming JSON message to historical_data.jsonl.
- Ensure the repo-local venv has
websocketsinstalled:
source .venv/bin/activate
python -m pip install -U pip websockets certifi-
Set the token to record in
tools/polymarket_config.py(DEFAULT_ASSET_ID) or intools/record_polymarket.py(ASSET_IDSimports that default). These are Polymarket asset IDs / token IDs (one leg of a binary market), not a human ticker symbol. -
Run the recorder:
source .venv/bin/activate
python tools/record_polymarket.pyThe replay server in tools/replay_server.py serves historical_data.jsonl over a local TLS websocket and replays messages in order.
It prefers replaying the original on-the-wire websocket payload when available (raw_message / raw fields), uses local_timestamp_ns to reproduce short-term burstiness, and filters each line to a single asset (default: DEFAULT_ASSET_ID in tools/polymarket_config.py) so YES/NO legs are not merged into one book.
Override the filter:
REPLAY_ASSET_IDS=96351650250139397447438653380483970772060142397849794315678720298272472897874 python tools/replay_server.py- Ensure the repo-local venv has dependencies installed:
source .venv/bin/activate
python -m pip install -U pip websockets- Ensure you have a local TLS cert/key for the server:
./tools/gen_self_signed_cert.shThis generates tools/cert.pem and tools/key.pem (used by the replay server).
- Run the replay server:
source .venv/bin/activate
python tools/replay_server.pyOptional: cap inter-message sleep time to avoid multi-minute gaps if your JSONL contains time discontinuities (default is 0.5 seconds):
REPLAY_MAX_SLEEP_S=0.1 python tools/replay_server.py- Point the engine at the replay server (use the same asset id as replay filter):
./build/engine --strategy ofi 127.0.0.1 8765 /
# optional explicit asset (default matches polymarket_config.py):
# ./build/engine --strategy ofi --asset-id <token_id> 127.0.0.1 8765 /At this point trading_log.csv should accumulate P (mark-to-market) and some T (mock trade) rows. Realized PnL may remain 0.0 on this dataset for the reasons above; use tools/mock_wss_server.py if you want exaggerated PnL swings for dashboard demos.
The Streamlit dashboard should stop showing the “headers only” warning once data is flowing.
The Streamlit dashboard reads a CSV file named trading_log.csv (by default).
Required columns (used for plots/metrics):
timestamp_us,event_type,price,size,realized_pnl
Additional columns written by the engine:
latency_us: tick-to-log processing latency in microseconds (captures alpha + decision overhead)strategy: strategy name for metadata rows
Event types:
T: trade eventP: periodic mark-to-market pointM: metadata (e.g., active strategy name)
- Install dashboard deps (recommended inside the repo-local
(.venv)):
source .venv/bin/activate
python -m pip install -U streamlit pandas plotly streamlit-autorefresh- Run the dashboard:
streamlit run tools/dashboard.py- Point it at a different log file (optional):
TRADING_LOG_PATH=/path/to/trading_log.csv streamlit run tools/dashboard.pyNote: the dashboard will show a warning until trading_log.csv exists and has data.
The sidebar also shows the Active Strategy (parsed from metadata rows written at engine startup).
Optional (quick sanity check): generate a tiny sample log file:
python - <<'PY'
import csv
import time
rows = [
(0, 'P', 0.0, 0, 0.0),
(500_000, 'T', 0.59, 10, 0.0),
(1_000_000, 'P', 0.0, 0, 0.12),
(1_500_000, 'T', 0.57, -10, 0.12),
(2_000_000, 'P', 0.0, 0, 0.20),
]
with open('trading_log.csv', 'w', newline='') as f:
w = csv.writer(f)
w.writerow(['timestamp_us', 'event_type', 'price', 'size', 'realized_pnl'])
base = int(time.time() * 1_000_000)
for t, et, price, size, pnl in rows:
w.writerow([base + t, et, price, size, pnl])
print('wrote trading_log.csv')
PYNote for macOS users in conda (base): conda often injects search paths that can cause a mixed Boost install to be detected (e.g., Homebrew BoostConfig + conda boost_system). The build defaults to ignoring CONDA_PREFIX during dependency discovery; override with:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLPME_IGNORE_CONDA_PREFIX=OFFExpected output: the engine is mostly quiet when healthy. You should see trading_log.csv being written and the dashboard updating; on Ctrl+C you should see Shutdown complete.
.
├── CMakeLists.txt # Root build configuration (strict flags, deps)
├── include/ # Public API headers (orderbook, engine interfaces)
├── src/
│ ├── main.cpp # Entry point / wiring / shutdown
│ ├── strategy_engine.cpp # Strategy thread loop
│ └── websocket_client.cpp # Boost.Beast WSS client
├── tests/
│ └── test_main.cpp # GoogleTest suite
├── tools/
│ ├── polymarket_config.py # Default asset/market IDs (shared by recorder + replay)
│ ├── replay_filter.py # Single-asset JSONL replay filtering
│ ├── mock_wss_server.py # Local TLS websocket tick generator
│ ├── record_polymarket.py # Record Polymarket Market Channel JSONL
│ ├── replay_server.py # Replay historical_data.jsonl over local WSS
│ └── dashboard.py # Streamlit dashboard for trading_log.csv
├── .gitignore
├── README.md
└── build/ # Generated (ignored)
- Place headers in
include/ - Implementation in
src/ - Tests in
tests/ - Update
CMakeLists.txttargets as modules grow (consideradd_libraryfor core engine)
# After building (see Build & Run)
ctest --test-dir build -V- Profile with
perf, Valgrind, or Tracy - Ensure
-march=nativematches production hardware - Monitor cache misses and branch prediction in hot paths (order matching, risk checks)
- simdjson (v3.6.3): Zero-allocation JSON parsing
- GoogleTest (v1.15.2): Unit testing
- Boost: system + thread (required)
- OpenSSL: TLS for
wss://
This repository includes:
- Comprehensive
.gitignorefor C++/CMake/IDE artifacts - GitHub Actions CI (see
.github/workflows/ci.yml) for automated builds/tests across platforms - Modern CMake with dependency management and macOS/conda-friendly Boost discovery
- Add GitHub Actions workflow for:
- Linux (Ubuntu) + macOS matrix
- Release builds with sanitizers (
-fsanitize=address,undefined) - Benchmarking and performance regression tests
- Use
clang-tidyandinclude-what-you-usefor static analysis - Consider
conanorvcpkgfor full dependency management in larger projects
- More realistic order book model (multi-asset routing, sequencing/consistency checks)
- Polymarket Market Channel subscribe (
assets_ids) + single-asset parser/replay filter - Live reconnect + application-level PING/PONG + TLS verification policy
- Reconnect/backoff and ping/pong in the C++ websocket client
- Matching engine with low-latency priority queues
- Risk management and position tracking
- Benchmark suite (latency histograms, throughput)
- Python bindings (via pybind11) for research
This project is currently unlicensed (all rights reserved).
Not currently accepting external contributions.
Built for high-frequency prediction market execution. Questions? Open an issue.