Agents, Tool Use & Product Post-Training · Problem 6 of 7
Implement an inverse-propensity-weighted (IPS) off-policy evaluator for logged agent actions.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import numpy as np
def ips_evaluate(logs, target_policy, clip=None, self_normalize=True):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement an inverse-propensity-weighted (IPS) off-policy evaluator for logged agent actions.
Implement the function/class skeleton in the editor. Any correct approach is accepted.