Hard

Implement an inverse-propensity-weighted (IPS) off-policy evaluator for logged agent actio

Agents, Tool Use & Product Post-Training · Problem 6 of 7

Chapter 14Agents, Tool Use & Product Post-Training

HardProblem 6 / 7

Implement an inverse-propensity-weighted (IPS) off-policy evaluator for logged agent actions.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import numpy as np

def ips_evaluate(logs, target_policy, clip=None, self_normalize=True):
    raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 14Agents, Tool Use & Product Post-Training

HardProblem 6 / 7

Implement an inverse-propensity-weighted (IPS) off-policy evaluator for logged agent actions.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints