SFT, Instruction Tuning, Data & PEFT · Problem 6 of 6
Build a dataset-audit tool that flags duplicated prompts, suspicious templates, tool-call leakage, and evaluation-set contamination.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import hashlib, re
from collections import Counter
def _norm(t):
raise NotImplementedError
def _hash(t):
raise NotImplementedError
def audit(dataset, eval_prompts):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Build a dataset-audit tool that flags duplicated prompts, suspicious templates, tool-call leakage, and evaluation-set contamination.
Implement the function/class skeleton in the editor. Any correct approach is accepted.