Hard

Implement a single MLA layer (down-proj to latent, up-proj, decoupled RoPE)

Transformer Architecture Internals · Problem 6 of 7

All problems

Chapter 01Transformer Architecture Internals

Implement a single MLA layer (down-proj to latent, up-proj, decoupled RoPE)

HardProblem 6 / 7

Implement a single MLA layer (down-proj to latent, up-proj, decoupled RoPE).

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import torch
import torch.nn as nn
import torch.nn.functional as F

class MLA(nn.Module):

    def __init__(self, d, n_heads, d_c, d_c_q, dh_nope, dh_rope):
        raise NotImplementedError

    def forward(self, x, cos, sin):
        raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 01Transformer Architecture Internals

Implement a single MLA layer (down-proj to latent, up-proj, decoupled RoPE)

HardProblem 6 / 7

Implement a single MLA layer (down-proj to latent, up-proj, decoupled RoPE).

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints