Skip to content

Add challenge 100: Softmax Attention Backward (Medium)#265

Open
AaronToh wants to merge 2 commits into
AlphaGPU:mainfrom
AaronToh:add-challenge-100-backward-attention
Open

Add challenge 100: Softmax Attention Backward (Medium)#265
AaronToh wants to merge 2 commits into
AlphaGPU:mainfrom
AaronToh:add-challenge-100-backward-attention

Conversation

@AaronToh
Copy link
Copy Markdown
Contributor

@AaronToh AaronToh commented May 9, 2026

#Summary

  • Adds challenge 100: Softmax Attention Backward, the backward pass through scaled dot-product attention.
  • Given Q (M×d), K (N×d), V (N×d), and upstream gradient dO (M×d), the solver must compute dQ, dK, and dV by propagating gradients through the softmax Jacobian-vector product and the scaling factor 1/√d. This is a natural follow-up to challenge 6 (Softmax Attention).
  • Includes challenge.py (8 functional tests covering edge cases, zeros, negatives, powers-of-2, non-powers-of-2, and a performance test of M=512 × N=256 × d=128), challenge.html with full backward pass derivation, and starter files for all six frameworks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant