RLHF, RL & Preference Optimization (Core) · Problem 1 of 7
Implement the Bradley–Terry pairwise reward-model loss in PyTorch.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import torch
import torch.nn.functional as F
def bt_loss(reward_chosen: torch.Tensor, reward_rejected: torch.Tensor) -> torch.Tensor:
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement the Bradley–Terry pairwise reward-model loss in PyTorch.
Implement the function/class skeleton in the editor. Any correct approach is accepted.