Medium

Implement a projection adapter and interleave image++text embeddings into one sequence wi

Multimodal / Vision-Language (Lighter) · Problem 2 of 3

Chapter 16Multimodal / Vision-Language (Lighter)

Implement a projection adapter and interleave image++text embeddings into one sequence wi

MediumProblem 2 / 3

Implement a projection adapter and interleave image++text embeddings into one sequence with correct attention masking.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints