Hard

Fit L(N,D)=E+A/Nα+B/DβL(N,D)=E+A/N^{\alpha}+B/D^{\beta} to a synthetic (N,D,loss)(N,D,\text{loss}) grid via least

Pretraining Objectives & Scaling Laws · Problem 3 of 4

Chapter 04Pretraining Objectives & Scaling Laws

Fit L(N,D)=E+A/Nα+B/DβL(N,D)=E+A/N^{\alpha}+B/D^{\beta} to a synthetic (N,D,loss)(N,D,\text{loss}) grid via least

HardProblem 3 / 4

Fit L(N,D)=E+A/Nα+B/DβL(N,D)=E+A/N^{\alpha}+B/D^{\beta} to a synthetic (N,D,loss)(N,D,\text{loss}) grid via least squares in log space.

Implement the function/class skeleton in the editor. Any correct approach is accepted.

Hints