Easy

Implement the DPO loss given policy/reference logprobs for chosen/rejected and $\beta$

Alignment Algorithms Zoo · Problem 1 of 5

Chapter 10Alignment Algorithms Zoo

EasyProblem 1 / 5

Implement the DPO loss given policy/reference logprobs for chosen/rejected and $\beta$ .

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import torch
import torch.nn.functional as F

def dpo_loss(pi_logps_w, pi_logps_l, ref_logps_w, ref_logps_l, beta=0.1):
    raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 10Alignment Algorithms Zoo

EasyProblem 1 / 5

Implement the DPO loss given policy/reference logprobs for chosen/rejected and $\beta$ .

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

Implement the DPO loss given policy/reference logprobs for chosen/rejected and β\betaβ