Hard

Implement grouped-query attention with configurable KV heads plus a KV cache for increment

Transformer Architecture Internals · Problem 5 of 7

Chapter 01Transformer Architecture Internals

Implement grouped-query attention with configurable KV heads plus a KV cache for increment

HardProblem 5 / 7

Implement grouped-query attention with configurable KV heads plus a KV cache for incremental decoding.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints