Medium

Implement GAE (the backward recursion) given per-token rewards and value estimates

RLHF, RL & Preference Optimization (Core) · Problem 2 of 7

Chapter 09RLHF, RL & Preference Optimization (Core)

MediumProblem 2 / 7

Implement GAE (the backward recursion) given per-token rewards and value estimates.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import numpy as np

def compute_gae(rewards, values, gamma=0.99, lam=0.95, last_value=0.0):
    raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 09RLHF, RL & Preference Optimization (Core)

MediumProblem 2 / 7

Implement GAE (the backward recursion) given per-token rewards and value estimates.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints