Medium

Implement the GRPO group-normalized advantage and the clipped token-level objective

RLHF, RL & Preference Optimization (Core) · Problem 3 of 7

Chapter 09RLHF, RL & Preference Optimization (Core)

Implement the GRPO group-normalized advantage and the clipped token-level objective

MediumProblem 3 / 7

Implement the GRPO group-normalized advantage and the clipped token-level objective. [DeepSeek]

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints