Tokenization & Embeddings · Problem 4 of 4
Implement a byte-level BPE end-to-end (train encode decode) over arbitrary UTF-8 bytes.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
from collections import Counter
class ByteBPE:
def __init__(self):
raise NotImplementedError
def train(self, text, vocab_size):
raise NotImplementedError
@staticmethod
def _merge(ids, a, b):
raise NotImplementedError
def encode(self, text):
raise NotImplementedError
def decode(self, tokens):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement a byte-level BPE end-to-end (train encode decode) over arbitrary UTF-8 bytes.
Implement the function/class skeleton in the editor. Any correct approach is accepted.