Medium

Implement best-of- $n$ selection given a reward/verifier over sampled completions

Alignment Algorithms Zoo · Problem 3 of 5

All problems

Chapter 10Alignment Algorithms Zoo

Implement best-of- $n$ selection given a reward/verifier over sampled completions

MediumProblem 3 / 5

Implement best-of- $n$ selection given a reward/verifier over sampled completions.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import numpy as np

def best_of_n(completions, reward_fn):
    raise NotImplementedError

def best_of_n_verifier(completions, verify_fn):
    raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 10Alignment Algorithms Zoo

Implement best-of- $n$ selection given a reward/verifier over sampled completions

MediumProblem 3 / 5

Implement best-of- $n$ selection given a reward/verifier over sampled completions.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

Implement best-of-nnn selection given a reward/verifier over sampled completions

Implement best-of-nnn selection given a reward/verifier over sampled completions

Implement best-of- $n$ selection given a reward/verifier over sampled completions

Implement best-of- $n$ selection given a reward/verifier over sampled completions