Medium

Implement best-of-nn selection given a reward/verifier over sampled completions

Alignment Algorithms Zoo · Problem 3 of 5

Chapter 10Alignment Algorithms Zoo

Implement best-of-nn selection given a reward/verifier over sampled completions

MediumProblem 3 / 5

Implement best-of-nn selection given a reward/verifier over sampled completions.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints