| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jbellis 30 days ago

Really cool work!

Does the training data budget scale with model size?

How would you compare the Gemma 4 draft model which is also integrated with the base kv cache?