Multimodal / Vision-Language (Lighter) · Problem 1 of 3
Implement patch embedding (conv or unfold linear) converting an image tensor to patch tokens.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import torch
import torch.nn as nn
class PatchEmbed(nn.Module):
def __init__(self, in_chans=3, patch_size=16, embed_dim=768):
raise NotImplementedError
def forward(self, x):
raise NotImplementedError
class PatchEmbedUnfold(nn.Module):
def __init__(self, in_chans=3, patch_size=16, embed_dim=768):
raise NotImplementedError
def forward(self, x):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement patch embedding (conv or unfold linear) converting an image tensor to patch tokens.
Implement the function/class skeleton in the editor. Any correct approach is accepted.