Hard

Implement a blocked/tiled attention forward pass (FlashAttention-style) with running max/s

Attention Efficiency & Long Context · Problem 4 of 5

Chapter 03Attention Efficiency & Long Context

Implement a blocked/tiled attention forward pass (FlashAttention-style) with running max/s

HardProblem 4 / 5

Implement a blocked/tiled attention forward pass (FlashAttention-style) with running max/sum and verify it matches naive attention.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import torch

def flash_attention(Q, K, V, block=16, causal=True):
    raise NotImplementedError

def naive(Q, K, V, causal=True):
    raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 03Attention Efficiency & Long Context

Implement a blocked/tiled attention forward pass (FlashAttention-style) with running max/s

HardProblem 4 / 5

Implement a blocked/tiled attention forward pass (FlashAttention-style) with running max/sum and verify it matches naive attention.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints