Research Engineering & Debugging · Problem 6 of 8
Debug a broken GRPO implementation with wrong grouping, wrong masks, and wrong old-logprobs.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import torch, torch.nn.functional as F
def grpo_loss(logits, old_logprobs, actions, group_ids, rewards, completion_mask, eps=0.2):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Debug a broken GRPO implementation with wrong grouping, wrong masks, and wrong old-logprobs.
Implement the function/class skeleton in the editor. Any correct approach is accepted.