Easy

Implement the DPO loss given policy/reference logprobs for chosen/rejected and β\beta

Alignment Algorithms Zoo · Problem 1 of 5

Chapter 10Alignment Algorithms Zoo

Implement the DPO loss given policy/reference logprobs for chosen/rejected and β\beta

EasyProblem 1 / 5

Implement the DPO loss given policy/reference logprobs for chosen/rejected and β\beta.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints