Mixture-Of-Experts · Problem 3 of 4
Implement the load-balancing aux loss and a training step demonstrating it equalizes expert usage on synthetic data.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import torch
import torch.nn as nn
import torch.nn.functional as F
def load_balance_loss(logits, k, n_experts):
raise NotImplementedError
def demo():
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement the load-balancing aux loss and a training step demonstrating it equalizes expert usage on synthetic data.
Implement the function/class skeleton in the editor. Any correct approach is accepted.