RLHF, RL & Preference Optimization (Core) · Problem 2 of 7
Implement GAE (the backward recursion) given per-token rewards and value estimates.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import numpy as np
def compute_gae(rewards, values, gamma=0.99, lam=0.95, last_value=0.0):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement GAE (the backward recursion) given per-token rewards and value estimates.
Implement the function/class skeleton in the editor. Any correct approach is accepted.