Hard

Implement a minimal masked-diffusion training loss (sample a mask rate, mask tokens, cross

Diffusion & Non-Autoregressive Language Models · Problem 3 of 4

Chapter 15Diffusion & Non-Autoregressive Language Models

Implement a minimal masked-diffusion training loss (sample a mask rate, mask tokens, cross

HardProblem 3 / 4

Implement a minimal masked-diffusion training loss (sample a mask rate, mask tokens, cross-entropy on masked positions) for a tiny transformer.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints