Super-hard

Build a dataset-audit tool that flags duplicated prompts, suspicious templates, tool-call

SFT, Instruction Tuning, Data & PEFT · Problem 6 of 6

Chapter 08SFT, Instruction Tuning, Data & PEFT

Build a dataset-audit tool that flags duplicated prompts, suspicious templates, tool-call

Super-hardProblem 6 / 6

Build a dataset-audit tool that flags duplicated prompts, suspicious templates, tool-call leakage, and evaluation-set contamination.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints

solution.pypython

local draft

import hashlib, re
from collections import Counter

def _norm(t):
    raise NotImplementedError

def _hash(t):
    raise NotImplementedError

def audit(dataset, eval_prompts):
    raise NotImplementedError

⌘/Ctrl + ↵ to submit

AI review

Ready when you are

Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.

Chapter 08SFT, Instruction Tuning, Data & PEFT

Build a dataset-audit tool that flags duplicated prompts, suspicious templates, tool-call

Super-hardProblem 6 / 6

Build a dataset-audit tool that flags duplicated prompts, suspicious templates, tool-call leakage, and evaluation-set contamination.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints