Tokenization & Embeddings · Problem 2 of 4
Implement BPE training (learn merges) from a corpus in pure Python.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
from collections import Counter
def get_pair_counts(words):
raise NotImplementedError
def merge_pair(words, pair):
raise NotImplementedError
def train_bpe(corpus, num_merges):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement BPE training (learn merges) from a corpus in pure Python.
Implement the function/class skeleton in the editor. Any correct approach is accepted.