GitHub - ccheney/reflex: Episodic memory and semantic cache proxy for LLM APIs with ~40% token savings

Under Construction: This project is actively being developed and is not yet ready for production use. APIs and features may change without notice.

Episodic Memory & Semantic Cache for LLM Responses

Because nobody likes paying for the same token twice.

██████╗ ███████╗███████╗██╗     ███████╗██╗  ██╗
██╔══██╗██╔════╝██╔════╝██║     ██╔════╝╚██╗██╔╝
██████╔╝█████╗  █████╗  ██║     █████╗   ╚███╔╝
██╔══██╗██╔══╝  ██╔══╝  ██║     ██╔══╝   ██╔██╗
██║  ██║███████╗██║     ███████╗███████╗██╔╝ ██╗
╚═╝  ╚═╝╚══════╝╚═╝     ╚══════╝╚══════╝╚═╝  ╚═╝

What It Is

Reflex is an OpenAI-compatible HTTP cache for LLM responses: it sits between your agent/app and the provider, returning cached answers instantly and storing misses for later reuse. Cached responses are returned in Tauq format to reduce token overhead.

Quick Start (Server)

# 1. Start Qdrant (vector database)
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant

# 2. Run Reflex (HTTP server)
cargo run -p reflex-server --release

# 3. Point your agent to localhost:8080
export OPENAI_BASE_URL=http://localhost:8080/v1

Quick Start (Library)

# Run the library example (no HTTP server)
cargo run -p reflex-cache --example basic_lookup --features mock

Embed in your own app:

[dependencies]
reflex = { package = "reflex-cache", version = "x.x.x" }

Crates In This Repo

Server + binary (reflex): crates/reflex-server
Core library (embedded use): crates/reflex-cache (docs.rs: https://docs.rs/reflex-cache)

How It Works (High Level)

Request → L1 (exact) → L2 (semantic) → L3 (rerank/verify) → Provider

L1: exact match (fast, in-memory)
L2: semantic retrieval (Qdrant vector search)
L3: verification (cross-encoder rerank to avoid false positives)

Development

cargo test
cargo clippy --all-targets -- -D warnings
cargo fmt -- --check

Reflex: Stop paying for the same token twice.

_{Built with Rust, Qdrant, and a healthy disdain for redundant API calls.}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.cargo		.cargo
.github/workflows		.github/workflows
.vscode		.vscode
crates		crates
.dockerignore		.dockerignore
.gitignore		.gitignore
.tool-versions		.tool-versions
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
next_tasks.md		next_tasks.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What It Is

Quick Start (Server)

Quick Start (Library)

Crates In This Repo

How It Works (High Level)

Development

About

Uh oh!

Releases 4

Packages

Languages

License

ccheney/reflex

Folders and files

Latest commit

History

Repository files navigation

What It Is

Quick Start (Server)

Quick Start (Library)

Crates In This Repo

How It Works (High Level)

Development

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages