Inference & Serving · Problem 5 of 5
Implement a continuous-batching scheduler simulator with a paged KV cache that admits/evicts requests and reports throughput and P99.
Implement the function/class skeleton in the editor. Any correct approach is accepted.
import numpy as np
from collections import deque
class Request:
def __init__(self, rid, arrival, prompt_len, gen_len):
raise NotImplementedError
class PagedScheduler:
def __init__(self, total_pages, page_size, max_batch):
raise NotImplementedError
def _pages_needed(self, seq_len):
raise NotImplementedError
def _try_admit(self):
raise NotImplementedError
def _evict(self, req, t):
raise NotImplementedError
def run(self, arrivals):
raise NotImplementedErrorReady when you are
Submit your solution and a structured review appears here — verdict, score, and concrete feedback. Any correct approach passes.
Implement a continuous-batching scheduler simulator with a paged KV cache that admits/evicts requests and reports throughput and P99.
Implement the function/class skeleton in the editor. Any correct approach is accepted.