Alignment Algorithms Zoo · Problem 3 of 5
Implement best-of- selection given a reward/verifier over sampled completions.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import numpy as np
def best_of_n(completions, reward_fn):
raise NotImplementedError
def best_of_n_verifier(completions, verify_fn):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement best-of- selection given a reward/verifier over sampled completions.
Implement the function/class skeleton in the editor. Any correct approach is accepted.