Super-hard

Implement YaRN RoPE scaling (frequency grouping + attention-logit scaling) and demonstrate

Attention Efficiency & Long Context · Problem 5 of 5

Chapter 03Attention Efficiency & Long Context

Implement YaRN RoPE scaling (frequency grouping + attention-logit scaling) and demonstrate

Super-hardProblem 5 / 5

Implement YaRN RoPE scaling (frequency grouping + attention-logit scaling) and demonstrate length extrapolation on a toy retrieval task.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints