Hard

Implement cross-attention adapter layers letting text tokens attend to frozen vision featu

Multimodal / Vision-Language (Lighter) · Problem 3 of 3

Chapter 16Multimodal / Vision-Language (Lighter)

Implement cross-attention adapter layers letting text tokens attend to frozen vision featu

HardProblem 3 / 3

Implement cross-attention adapter layers letting text tokens attend to frozen vision features with gated residual injection.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints