Hard

Implement an inverse-propensity-weighted (IPS) off-policy evaluator for logged agent actio

Agents, Tool Use & Product Post-Training · Problem 6 of 7

Chapter 14Agents, Tool Use & Product Post-Training

Implement an inverse-propensity-weighted (IPS) off-policy evaluator for logged agent actio

HardProblem 6 / 7

Implement an inverse-propensity-weighted (IPS) off-policy evaluator for logged agent actions.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints