Attention Efficiency & Long Context · Problem 5 of 5
Implement YaRN RoPE scaling (frequency grouping + attention-logit scaling) and demonstrate length extrapolation on a toy retrieval task.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import torch, math
def yarn_freqs(d, base=10000.0, L=256, scale=8.0, alpha=1.0, beta=32.0):
raise NotImplementedError
def build_rope(inv_freq, T, mscale=1.0):
raise NotImplementedError
def rotate_half(x):
raise NotImplementedError
def apply_rope(x, cos, sin):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement YaRN RoPE scaling (frequency grouping + attention-logit scaling) and demonstrate length extrapolation on a toy retrieval task.
Implement the function/class skeleton in the editor. Any correct approach is accepted.