Alignment Algorithms Zoo · Problem 1 of 5
Implement the DPO loss given policy/reference logprobs for chosen/rejected and .
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import torch
import torch.nn.functional as F
def dpo_loss(pi_logps_w, pi_logps_l, ref_logps_w, ref_logps_l, beta=0.1):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement the DPO loss given policy/reference logprobs for chosen/rejected and .
Implement the function/class skeleton in the editor. Any correct approach is accepted.