Easy

Implement the Bradley–Terry pairwise reward-model loss in PyTorch

RLHF, RL & Preference Optimization (Core) · Problem 1 of 7

Chapter 09RLHF, RL & Preference Optimization (Core)

Implement the Bradley–Terry pairwise reward-model loss in PyTorch

EasyProblem 1 / 7

Implement the Bradley–Terry pairwise reward-model loss in PyTorch.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints