Multimodal / Vision-Language (Lighter) · Problem 3 of 3
Implement cross-attention adapter layers letting text tokens attend to frozen vision features with gated residual injection.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import torch
import torch.nn as nn
class GatedCrossAttn(nn.Module):
def __init__(self, d_model, n_heads, d_vision):
raise NotImplementedError
def forward(self, x, vision, vision_mask=None):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement cross-attention adapter layers letting text tokens attend to frozen vision features with gated residual injection.
Implement the function/class skeleton in the editor. Any correct approach is accepted.