Medium

Implement GAE (the backward recursion) given per-token rewards and value estimates

RLHF, RL & Preference Optimization (Core) · Problem 2 of 7

Chapter 09RLHF, RL & Preference Optimization (Core)

Implement GAE (the backward recursion) given per-token rewards and value estimates

MediumProblem 2 / 7

Implement GAE (the backward recursion) given per-token rewards and value estimates.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints