Medium

Implement a full sparse MoE FFN with capacity, token dropping, and gate-weighted combinati

Mixture-Of-Experts · Problem 2 of 4

All problems

Chapter 07Mixture-Of-Experts

Implement a full sparse MoE FFN with capacity, token dropping, and gate-weighted combinati

MediumProblem 2 / 4

Implement a full sparse MoE FFN with capacity, token dropping, and gate-weighted combination.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import math
import torch
import torch.nn as nn
import torch.nn.functional as F

class Expert(nn.Module):

    def __init__(self, d_model, d_ff):
        raise NotImplementedError

    def forward(self, x):
        raise NotImplementedError

class SparseMoE(nn.Module):

    def __init__(self, d_model, d_ff, n_experts, k, capacity_factor=1.25):
        raise NotImplementedError

    def forward(self, x):
        raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 07Mixture-Of-Experts

Implement a full sparse MoE FFN with capacity, token dropping, and gate-weighted combinati

MediumProblem 2 / 4

Implement a full sparse MoE FFN with capacity, token dropping, and gate-weighted combination.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints