Hard

Implement cross-attention adapter layers letting text tokens attend to frozen vision featu

Multimodal / Vision-Language (Lighter) · Problem 3 of 3

Chapter 16Multimodal / Vision-Language (Lighter)

Implement cross-attention adapter layers letting text tokens attend to frozen vision featu

HardProblem 3 / 3

Implement cross-attention adapter layers letting text tokens attend to frozen vision features with gated residual injection.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import torch
import torch.nn as nn

class GatedCrossAttn(nn.Module):

    def __init__(self, d_model, n_heads, d_vision):
        raise NotImplementedError

    def forward(self, x, vision, vision_mask=None):
        raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 16Multimodal / Vision-Language (Lighter)

Implement cross-attention adapter layers letting text tokens attend to frozen vision featu

HardProblem 3 / 3

Implement cross-attention adapter layers letting text tokens attend to frozen vision features with gated residual injection.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints