SFT, Instruction Tuning, Data & PEFT · Problem 4 of 6
Implement a synthetic-data pipeline: generate candidate examples, validate them with a rule-based checker, deduplicate, and report the resulting source mixture.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import hashlib, re
from collections import Counter
def normalize(text):
raise NotImplementedError
def prompt_hash(prompt):
raise NotImplementedError
def rule_checker(ex):
raise NotImplementedError
def build_dataset(generators):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement a synthetic-data pipeline: generate candidate examples, validate them with a rule-based checker, deduplicate, and report the resulting source mixture.
Implement the function/class skeleton in the editor. Any correct approach is accepted.