Super-hard

Implement YaRN RoPE scaling (frequency grouping + attention-logit scaling) and demonstrate

Attention Efficiency & Long Context · Problem 5 of 5

Chapter 03Attention Efficiency & Long Context

Implement YaRN RoPE scaling (frequency grouping + attention-logit scaling) and demonstrate

Super-hardProblem 5 / 5

Implement YaRN RoPE scaling (frequency grouping + attention-logit scaling) and demonstrate length extrapolation on a toy retrieval task.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import torch, math

def yarn_freqs(d, base=10000.0, L=256, scale=8.0, alpha=1.0, beta=32.0):
    raise NotImplementedError

def build_rope(inv_freq, T, mscale=1.0):
    raise NotImplementedError

def rotate_half(x):
    raise NotImplementedError

def apply_rope(x, cos, sin):
    raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 03Attention Efficiency & Long Context

Implement YaRN RoPE scaling (frequency grouping + attention-logit scaling) and demonstrate

Super-hardProblem 5 / 5

Implement YaRN RoPE scaling (frequency grouping + attention-logit scaling) and demonstrate length extrapolation on a toy retrieval task.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints