Transformer Architecture Internals · Problem 7 of 7
Implement a numerically stable online-softmax attention pass (FlashAttention recurrence) in numpy that never materializes the full score matrix.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import numpy as np
def flash_attention(Q, K, V, causal=True, bq=64, bk=64):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement a numerically stable online-softmax attention pass (FlashAttention recurrence) in numpy that never materializes the full score matrix.
Implement the function/class skeleton in the editor. Any correct approach is accepted.