Medium

Implement the GRPO group-normalized advantage and the clipped token-level objective

RLHF, RL & Preference Optimization (Core) · Problem 3 of 7

All problems

Chapter 09RLHF, RL & Preference Optimization (Core)

Implement the GRPO group-normalized advantage and the clipped token-level objective

MediumProblem 3 / 7

Implement the GRPO group-normalized advantage and the clipped token-level objective. [DeepSeek]

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import torch
import torch.nn.functional as F

def grpo_advantage(rewards):
    raise NotImplementedError

def grpo_loss(logp_new, logp_old, advantages, mask, eps=0.2):
    raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 09RLHF, RL & Preference Optimization (Core)

Implement the GRPO group-normalized advantage and the clipped token-level objective

MediumProblem 3 / 7

Implement the GRPO group-normalized advantage and the clipped token-level objective. [DeepSeek]

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints