Medium

Implement grouped-query attention (GQA) in PyTorch by repeating/broadcasting KV heads acro

Attention Efficiency & Long Context · Problem 3 of 5

Chapter 03Attention Efficiency & Long Context

Implement grouped-query attention (GQA) in PyTorch by repeating/broadcasting KV heads acro

MediumProblem 3 / 5

Implement grouped-query attention (GQA) in PyTorch by repeating/broadcasting KV heads across query-head groups, and verify it matches full multi-head attention when num_kv_heads equals num_query_heads.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints