Mixture-Of-Experts · Problem 2 of 4
Implement a full sparse MoE FFN with capacity, token dropping, and gate-weighted combination.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
class Expert(nn.Module):
def __init__(self, d_model, d_ff):
raise NotImplementedError
def forward(self, x):
raise NotImplementedError
class SparseMoE(nn.Module):
def __init__(self, d_model, d_ff, n_experts, k, capacity_factor=1.25):
raise NotImplementedError
def forward(self, x):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement a full sparse MoE FFN with capacity, token dropping, and gate-weighted combination.
Implement the function/class skeleton in the editor. Any correct approach is accepted.