Easy

Implement the Bradley–Terry pairwise reward-model loss in PyTorch

RLHF, RL & Preference Optimization (Core) · Problem 1 of 7

All problems

Chapter 09RLHF, RL & Preference Optimization (Core)

Implement the Bradley–Terry pairwise reward-model loss in PyTorch

EasyProblem 1 / 7

Implement the Bradley–Terry pairwise reward-model loss in PyTorch.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import torch
import torch.nn.functional as F

def bt_loss(reward_chosen: torch.Tensor, reward_rejected: torch.Tensor) -> torch.Tensor:
    raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 09RLHF, RL & Preference Optimization (Core)

Implement the Bradley–Terry pairwise reward-model loss in PyTorch

EasyProblem 1 / 7

Implement the Bradley–Terry pairwise reward-model loss in PyTorch.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints