Multimodal / Vision-Language (Lighter) · Problem 2 of 3
Implement a projection adapter and interleave imagetext embeddings into one sequence with correct attention masking.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import torch
import torch.nn as nn
class Projector(nn.Module):
def __init__(self, d_v, d_model):
raise NotImplementedError
def forward(self, x):
raise NotImplementedError
def interleave(text_embeds, text_mask, img_embeds, image_token_pos):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement a projection adapter and interleave imagetext embeddings into one sequence with correct attention masking.
Implement the function/class skeleton in the editor. Any correct approach is accepted.