🪐 Titans: Learning to Memorize at Test Time

An advanced, high-performance PyTorch implementation of the Titans architecture (Google Research, Jan 2025). This repo provides the tools to build models with Infinite Context using Neural Long-Term Memory.

📖 Table of Contents

Core Concept
Mathematical Foundation
Architecture Comparison
Variants Detailed
Advanced Configuration
Performance & Parallel Scan
Project Structure

🔍 Core Concept

Traditional Transformers use a fixed-size Short-Term Memory (Attention). As the sequence grows, the cost becomes quadratic ($O(T^2)$), and ancient information is eventually truncated.

Titans solve this by adding a Neural Memory branch. This branch is a deep MLP that acts as an associative store. For every new token, the model:

Reads from memory to get context.
Computes the "surprise" (loss) of the new token.
Updates its own weights via one step of gradient descent to "learn" the token.

🧮 Mathematical Foundation

The Neural Memory update follows these core equations from the paper:

1. Surprise Update (Momentum)

$$S_t = \eta_t S_{t-1} + \theta_t \nabla \ell(M_{t-1}; x_t)$$ Where $\nabla \ell$ is the gradient of the MSE loss between memory prediction and the actual value. $\eta$ is the surprise momentum.

2. Forgetting Gate (Weight Decay)

$$M_t = (1 - \alpha_t) M_{t-1} + S_t$$ The memory weights $M$ are updated using a combination of forgetting (weight decay) and the new surprise $S_t$.

📊 Architecture Comparison

Feature	Transformers	RNN / LSTM	Mamba / SSM	Titans (Ours)
Context Length	Fixed (Linear/Quad)	Infinite (but lossy)	Infinite	Infinite (High Fidelity)
Logic	Matching	Compression	Linear Dynamics	Test-Time Learning
Scaling	$O(T^2)$	$O(T)$	$O(T)$	$O(T)$ (or $O(\log T)$)
Stability	Very High	Low	High	Very High

🧱 Variants Detailed

MAC (Memory as a Context)

The gold standard for long-context RAG-style tasks.

Workflow: Retrieve Memory -> Prepend to Attention -> Full Attention.
Best for: Coding assistants, legal document analysis.

MAG (Memory as a Gate)

Workflow: Attention and Memory branches run in parallel; their outputs are gated via a SiLU-based mechanism.
Best for: Creative writing and reasoning where short-term and long-term context must blend.

MAL (Memory as a Layer)

Workflow: A sequence is passed through Neural Memory, followed by a Sliding Window Attention layer.
Best for: General-purpose LLMs seeking a balance between speed and precision.

⚙️ Advanced Configuration

Our TitansConfig allows for granular control over the memory dynamics:

from titans.utils import TitansConfig

cfg = TitansConfig(
    variant="MAC",
    d_model=512,
    n_layers=12,
    mem_layers=2,        # Depth of the internal Neural Memory MLP
    n_persistent=16,     # Constant tokens that stay in memory
    chunk_size=64,       # Parallelization chunk size (Inner-loop)
    use_momentum=True,   # Enable η surprise flow
    use_decay=True       # Enable α forgetting gate
)

⚡ Performance & Parallel Scan

In version 0.3.0, we implemented a Binary Tree Associative Scan.

Why it matters: Standard RNN-like updates must run token-by-token (one after another). Our associative scan allows the GPU to process entire chunks of a sequence at once by using the associative property of the linear recurrence, reducing latency from $O(T)$ to $O(\log T)$.

📂 Project Structure

titans-memory/
├── titans/
│   ├── memory/           # Neural & Persistent Memory cores
│   ├── models/           # MAC, MAG, MAL, LMM variants
│   ├── ops/              # Parallel Associative Scan & Attention
│   └── utils/
│       ├── hf.py         # HuggingFace Transformers wrapper
│       ├── training.py   # DDP & Optimizer helpers
│       └── config.py     # Unified TitansConfig
├── tests/                # Full test suite (51+ tests)
├── scripts/              # Weight conversion & local scripts
├── examples/             # Quickstart & Training demos
├── pyproject.toml        # Build system & Dependencies
└── README.md

Developed with precision by the Neuranox team.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
assets		assets
examples		examples
scripts		scripts
tests		tests
titans		titans
.gitignore		.gitignore
GETTING_STARTED.md		GETTING_STARTED.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🪐 Titans: Learning to Memorize at Test Time

📖 Table of Contents

🔍 Core Concept

🧮 Mathematical Foundation

1. Surprise Update (Momentum)

2. Forgetting Gate (Weight Decay)

📊 Architecture Comparison

🧱 Variants Detailed

MAC (Memory as a Context)

MAG (Memory as a Gate)

MAL (Memory as a Layer)

⚙️ Advanced Configuration

⚡ Performance & Parallel Scan

📂 Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🪐 Titans: Learning to Memorize at Test Time

📖 Table of Contents

🔍 Core Concept

🧮 Mathematical Foundation

1. Surprise Update (Momentum)

2. Forgetting Gate (Weight Decay)

📊 Architecture Comparison

🧱 Variants Detailed

MAC (Memory as a Context)

MAG (Memory as a Gate)

MAL (Memory as a Layer)

⚙️ Advanced Configuration

⚡ Performance & Parallel Scan

📂 Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages