Diffusion & Non-Autoregressive Language Models · Problem 3 of 4
Implement a minimal masked-diffusion training loss (sample a mask rate, mask tokens, cross-entropy on masked positions) for a tiny transformer.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import torch
import torch.nn as nn
import torch.nn.functional as F
class TinyDiffusionLM(nn.Module):
def __init__(self, vocab, d=128, n_layers=4, n_heads=4, max_len=64):
raise NotImplementedError
def forward(self, x):
raise NotImplementedError
def diffusion_loss(model, x0, mask_id, eps=0.001):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement a minimal masked-diffusion training loss (sample a mask rate, mask tokens, cross-entropy on masked positions) for a tiny transformer.
Implement the function/class skeleton in the editor. Any correct approach is accepted.