Super-hard

Implement tensor-parallel linear layers (column-parallel then row-parallel) with correct f

Infrastructure, Distributed Training & Scaling · Problem 4 of 4

Chapter 06Infrastructure, Distributed Training & Scaling

Implement tensor-parallel linear layers (column-parallel then row-parallel) with correct f

Super-hardProblem 4 / 4

Implement tensor-parallel linear layers (column-parallel then row-parallel) with correct forward/backward collectives in PyTorch.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import torch, torch.distributed as dist
from torch.autograd import Function

class _CopyToModelParallel(Function):

    @staticmethod
    def forward(ctx, x):
        raise NotImplementedError

    @staticmethod
    def backward(ctx, grad):
        raise NotImplementedError

class _ReduceFromModelParallel(Function):

    @staticmethod
    def forward(ctx, x):
        raise NotImplementedError

    @staticmethod
    def backward(ctx, grad):
        raise NotImplementedError

class ColumnParallelLinear(torch.nn.Module):

    def __init__(self, d_in, d_out, world, rank):
        raise NotImplementedError

    def forward(self, x):
        raise NotImplementedError

class RowParallelLinear(torch.nn.Module):

    def __init__(self, d_in, d_out, world, rank):
        raise NotImplementedError

    def forward(self, x_shard):
        raise NotImplementedError

class ParallelFFN(torch.nn.Module):

    def __init__(self, d_model, d_ff, world, rank):
        raise NotImplementedError

    def forward(self, x):
        raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 06Infrastructure, Distributed Training & Scaling

Implement tensor-parallel linear layers (column-parallel then row-parallel) with correct f

Super-hardProblem 4 / 4

Implement tensor-parallel linear layers (column-parallel then row-parallel) with correct forward/backward collectives in PyTorch.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints