Practice
Every problem, in one place
86 interactive coding problems and 1177+ theory & math questions across 17 chapters. Filter by difficulty, jump straight in, and track what you've solved.
86
Coding problems
1177+
Theory & math Qs
—
Sign in for progress
- Implement scaled dot-product attention with a causal mask in numpy CodingEasy
- Implement RMSNorm from scratch in PyTorch CodingEasy
- Implement multi-head attention from scratch in PyTorch incl CodingMedium
- Implement RoPE applied to a [batch, heads, seq, head_dim] tensor CodingMedium
- Implement grouped-query attention with configurable KV heads plus a KV cache for increment CodingHard
- Implement a single MLA layer (down-proj to latent, up-proj, decoupled RoPE) CodingHard
- Implement a numerically stable online-softmax attention pass (FlashAttention recurrence) i CodingSuper-hard
- Theory questions· 13 Q&ATheoryWarm-up
- Theory questions· 12 Q&ATheoryEasy
- Theory questions· 12 Q&ATheoryMedium
- Theory questions· 11 Q&ATheoryHard
- Theory questions· 7 Q&ATheoryInsane
- Math questions· 6 Q&AHands-on / MathSuper easy
- Math questions· 6 Q&AHands-on / MathMedium
- Math questions· 9 Q&AHands-on / MathHard
- Given a merge list, implement a BPE tokenizer for a string CodingEasy
- Implement BPE training (learn merges) from a corpus in pure Python CodingMedium
- Implement Viterbi segmentation for a unigram-LM tokenizer given token log-probs CodingHard
- Implement a byte-level BPE end-to-end (train encode decode) over arbitrary UTF-8 b CodingSuper-hard
- Theory questions· 12 Q&ATheoryWarm-up
- Theory questions· 13 Q&ATheoryEasy
- Theory questions· 11 Q&ATheoryMedium
- Theory questions· 10 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 10 Q&AHands-on / MathSuper easy
- Implement sliding-window causal attention in PyTorch CodingEasy
- Implement a KV cache + incremental single-token decode loop for a small transformer CodingMedium
- Implement grouped-query attention (GQA) in PyTorch by repeating/broadcasting KV heads acro CodingMedium
- Implement a blocked/tiled attention forward pass (FlashAttention-style) with running max/s CodingHard
- Implement YaRN RoPE scaling (frequency grouping + attention-logit scaling) and demonstrate CodingSuper-hard
- Theory questions· 13 Q&ATheoryWarm-up
- Theory questions· 12 Q&ATheoryEasy
- Theory questions· 11 Q&ATheoryMedium
- Theory questions· 10 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 8 Q&AHands-on / MathSuper easy
- Math questions· 4 Q&AHands-on / MathHard
- Intuition questions· 7 Q&AExperiments / Practitioner IntuitionOther
- Implement next-token cross-entropy for a batch of logits/targets in numpy with padding mas CodingEasy
- Given Chinchilla coefficients, return compute-optimal and for a budget (numeri CodingMedium
- Fit to a synthetic grid via least CodingHard
- Implement an IsoFLOP analysis: from loss curves at several fixed-FLOP budgets, extract the CodingSuper-hard
- Theory questions· 13 Q&ATheoryWarm-up
- Theory questions· 12 Q&ATheoryEasy
- Theory questions· 10 Q&ATheoryMedium
- Theory questions· 10 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 7 Q&AHands-on / MathSuper easy
- Math questions· 5 Q&AHands-on / MathHard
- Intuition questions· 4 Q&AExperiments / Practitioner IntuitionOther
- Implement AdamW from scratch in numpy for one parameter tensor CodingEasy
- Implement a cosine LR schedule with linear warmup as a callable CodingMedium
- Implement global gradient-norm clipping over a list of tensors CodingMedium
- Implement the Muon step (momentum + Newton–Schulz orthogonalization) for 2D params, falli CodingHard
- Implement a mixed-precision loop (bf16 compute, fp32 master weights) with loss scaling on CodingSuper-hard
- Theory questions· 12 Q&ATheoryWarm-up
- Theory questions· 12 Q&ATheoryEasy
- Theory questions· 11 Q&ATheoryMedium
- Theory questions· 10 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 8 Q&AHands-on / MathSuper easy
- Math questions· 5 Q&AHands-on / MathHard
- Intuition questions· 6 Q&AExperiments / Practitioner IntuitionOther
- Implement a function computing total per-GPU memory for a given model/parallelism conf CodingEasy
- Implement toy data-parallel SGD with manual gradient all-reduce (torch CodingMedium
- Implement a 1F1B pipeline-schedule simulator reporting the bubble fraction for given (stag CodingHard
- Implement tensor-parallel linear layers (column-parallel then row-parallel) with correct f CodingSuper-hard
- Theory questions· 12 Q&ATheoryWarm-up
- Theory questions· 12 Q&ATheoryEasy
- Theory questions· 10 Q&ATheoryMedium
- Theory questions· 10 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 10 Q&AHands-on / MathSuper easy
- Math questions· 1 Q&AHands-on / MathInsane
- Intuition questions· 6 Q&AExperiments / Practitioner IntuitionOther
- Implement top- routing (softmax top- renormalized gates) in PyTorch CodingEasy
- Implement a full sparse MoE FFN with capacity, token dropping, and gate-weighted combinati CodingMedium
- Implement the load-balancing aux loss and a training step demonstrating it equalizes exper CodingHard
- Implement expert-parallel dispatch/combine with a simulated all-to-all and verify outputs CodingSuper-hard
- Theory questions· 13 Q&ATheoryWarm-up
- Theory questions· 13 Q&ATheoryEasy
- Theory questions· 11 Q&ATheoryMedium
- Theory questions· 10 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 10 Q&AHands-on / MathSuper easy
- Math questions· 1 Q&AHands-on / MathInsane
- Intuition questions· 4 Q&AExperiments / Practitioner IntuitionOther
- Implement prompt-token loss masking given (prompt_len, total_len) per sample CodingEasy
- Implement a LoRA-wrapped linear layer (frozen trainable scaled by $ CodingMedium
- Implement sequence packing with a block-diagonal attention mask so packed samples can't at CodingHard
- Implement a synthetic-data pipeline: generate candidate examples, validate them with a rul CodingHard
- Implement QLoRA-style NF4 4-bit quantization of a weight matrix plus a LoRA adapter, verif CodingSuper-hard
- Build a dataset-audit tool that flags duplicated prompts, suspicious templates, tool-call CodingSuper-hard
- Theory questions· 12 Q&ATheoryWarm-up
- Theory questions· 12 Q&ATheoryEasy
- Theory questions· 13 Q&ATheoryMedium
- Theory questions· 13 Q&ATheoryHard
- Theory questions· 7 Q&ATheoryInsane
- Math questions· 10 Q&AHands-on / MathSuper easy
- Intuition questions· 7 Q&AExperiments / Practitioner IntuitionOther
- Implement the Bradley–Terry pairwise reward-model loss in PyTorch CodingEasy
- Implement GAE (the backward recursion) given per-token rewards and value estimates CodingMedium
- Implement the GRPO group-normalized advantage and the clipped token-level objective CodingMedium
- Implement a minimal PPO update for LLMs: ratio, clipped surrogate, value loss, per-token K CodingHard
- Implement Dr CodingHard
- Implement an end-to-end toy RLHF loop on a “bandit-LM”: train an RM from synthetic prefe CodingSuper-hard
- Implement DAPO on a toy RLVR task: clip-higher, dynamic sampling (drop all-correct/all-wro CodingSuper-hard
- Theory questions· 12 Q&ATheoryWarm-up
- Theory questions· 13 Q&ATheoryEasy
- Theory questions· 10 Q&ATheoryMedium
- Theory questions· 13 Q&ATheoryHard
- Theory questions· 7 Q&ATheoryInsane
- Math questions· 10 Q&AHands-on / MathSuper easy
- Math questions· 10 Q&AHands-on / MathHard
- Math questions· 2 Q&AHands-on / MathInsane
- Intuition questions· 12 Q&AExperiments / Practitioner IntuitionOther
- Implement the DPO loss given policy/reference logprobs for chosen/rejected and CodingEasy
- Implement SimPO (length-normalized, reference-free) and KTO losses and unit-test on toy da CodingMedium
- Implement best-of- selection given a reward/verifier over sampled completions CodingMedium
- Implement on-policy distillation: sample from the student, score tokens under a (toy) teac CodingHard
- Implement iterative/online DPO: generate on-policy pairs, label with a toy preference func CodingSuper-hard
- Theory questions· 12 Q&ATheoryWarm-up
- Theory questions· 11 Q&ATheoryEasy
- Theory questions· 11 Q&ATheoryMedium
- Theory questions· 10 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 7 Q&AHands-on / MathSuper easy
- Math questions· 7 Q&AHands-on / MathHard
- Intuition questions· 6 Q&AExperiments / Practitioner IntuitionOther
- Implement self-consistency: sample CoTs, extract answers, return the majority vote CodingEasy
- Implement best-of- selection given a reward/verifier over sampled completions CodingMedium
- Implement beam-search-over-reasoning-steps that expands/prunes partial CoTs using a PRM sc CodingHard
- Implement a rule-based-reward GRPO loop on a toy arithmetic task rewarding a correct boxed CodingSuper-hard
- Theory questions· 12 Q&ATheoryWarm-up
- Theory questions· 12 Q&ATheoryEasy
- Theory questions· 11 Q&ATheoryMedium
- Theory questions· 10 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 10 Q&AHands-on / MathSuper easy
- Math questions· 1 Q&AHands-on / MathInsane
- Intuition questions· 5 Q&AExperiments / Practitioner IntuitionOther
- Implement ECE given arrays of predicted confidences and correctness CodingEasy
- Implement a pairwise LLM-as-judge harness with position-swap debiasing (run both orders, a CodingMedium
- Implement a bootstrap confidence interval for win-rate from paired preference judgments CodingHard
- Implement an n-gram/embedding contamination detector that flags eval items overlapping a t CodingSuper-hard
- Theory questions· 13 Q&ATheoryWarm-up
- Theory questions· 13 Q&ATheoryEasy
- Theory questions· 13 Q&ATheoryMedium
- Theory questions· 12 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 10 Q&AHands-on / MathSuper easy
- Math questions· 1 Q&AHands-on / MathInsane
- Intuition questions· 4 Q&AExperiments / Practitioner IntuitionOther
- Implement temperature + top- + top- sampling from a logits vector in numpy CodingEasy
- Implement int8 symmetric per-channel weight quantization and dequantization for a linear l CodingMedium
- Implement nucleus (top-) sampling with correct renormalization and edge cases CodingMedium
- Implement speculative decoding: draft proposes tokens, target verifies in one pass, ac CodingHard
- Implement a continuous-batching scheduler simulator with a paged KV cache that admits/evic CodingSuper-hard
- Theory questions· 13 Q&ATheoryWarm-up
- Theory questions· 13 Q&ATheoryEasy
- Theory questions· 11 Q&ATheoryMedium
- Theory questions· 10 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 9 Q&AHands-on / MathSuper easy
- Math questions· 4 Q&AHands-on / MathHard
- Intuition questions· 5 Q&AExperiments / Practitioner IntuitionOther
- Validate a tool-call object against a JSON schema (required fields and types) CodingEasy
- Compute an agent success-rate metric over a list of trajectories CodingEasy
- Implement a tool-calling loop with retries, timeouts, and error handling CodingMedium
- Implement a code-evaluation harness that runs unit tests against generated solutions and r CodingMedium
- Implement a preference-dataset builder from accepted/rejected suggestions, including a pos CodingHard
- Implement an inverse-propensity-weighted (IPS) off-policy evaluator for logged agent actio CodingHard
- Build a toy agent environment where the model can solve a task, call tools, fail safely, o CodingSuper-hard
- Theory questions· 14 Q&ATheoryWarm-up
- Theory questions· 14 Q&ATheoryEasy
- Theory questions· 10 Q&ATheoryMedium
- Theory questions· 8 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 9 Q&AHands-on / MathSuper easy
- Math questions· 5 Q&AHands-on / MathHard
- Intuition questions· 6 Q&AExperiments / Practitioner IntuitionOther
- Implement the forward masking process for a masked diffusion LM (mask a fraction of to CodingEasy
- Implement a single reverse-denoising step: predict all masked tokens, keep the most confid CodingMedium
- Implement a minimal masked-diffusion training loss (sample a mask rate, mask tokens, cross CodingHard
- Implement a small end-to-end masked diffusion LM sampler with confidence-based remasking a CodingSuper-hard
- Theory questions· 12 Q&ATheoryWarm-up
- Theory questions· 12 Q&ATheoryEasy
- Theory questions· 12 Q&ATheoryMedium
- Theory questions· 10 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 10 Q&AHands-on / MathSuper easy
- Math questions· 2 Q&AHands-on / MathInsane
- Intuition questions· 4 Q&AExperiments / Practitioner IntuitionOther
- Implement patch embedding (conv or unfold linear) converting an image tensor to patch CodingEasy
- Implement a projection adapter and interleave imagetext embeddings into one sequence wi CodingMedium
- Implement cross-attention adapter layers letting text tokens attend to frozen vision featu CodingHard
- Theory questions· 12 Q&ATheoryWarm-up
- Theory questions· 13 Q&ATheoryEasy
- Theory questions· 10 Q&ATheoryMedium
- Theory questions· 10 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 10 Q&AHands-on / MathSuper easy
- Math questions· 1 Q&AHands-on / MathInsane
- Intuition questions· 3 Q&AExperiments / Practitioner IntuitionOther
- Write shape assertions for the tensors in an attention forward pass CodingEasy
- Write a check that verifies labels are correctly shifted by one relative to inputs CodingEasy
- Write unit tests for causal masking (no token may attend to the future) CodingMedium
- Write a cached-vs-full-forward equivalence test for incremental decoding CodingMedium
- Debug a broken GPT training notebook containing NaNs, a mask bug, and shifted labels CodingHard
- Debug a broken GRPO implementation with wrong grouping, wrong masks, and wrong old-logprob CodingHard
- Build a toy post-training stack with intentional bugs, then write tests that catch every o CodingSuper-hard
- Implement an automated regression harness that compares outputs, logits, losses, rewards, CodingSuper-hard
- Theory questions· 8 Q&ATheoryWarm-up
- Theory questions· 10 Q&ATheoryEasy
- Theory questions· 9 Q&ATheoryMedium
- Theory questions· 6 Q&ATheoryHard
- Theory questions· 5 Q&ATheoryInsane
- Math questions· 6 Q&AHands-on / MathSuper easy
- Math questions· 9 Q&AHands-on / MathMedium
- Math questions· 2 Q&AHands-on / MathInsane
- Intuition questions· 6 Q&AExperiments / Practitioner IntuitionOther
