Super-hard

Implement a numerically stable online-softmax attention pass (FlashAttention recurrence) i

Transformer Architecture Internals · Problem 7 of 7

Chapter 01Transformer Architecture Internals

Implement a numerically stable online-softmax attention pass (FlashAttention recurrence) i

Super-hardProblem 7 / 7

Implement a numerically stable online-softmax attention pass (FlashAttention recurrence) in numpy that never materializes the full n×nn\times n score matrix.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints