Hacker News new | ask | show | jobs
by jbellis 30 days ago
Really cool work!

Does the training data budget scale with model size?

How would you compare the Gemma 4 draft model which is also integrated with the base kv cache?