| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wyager 20 days ago

This description falls apart for two reasons

1. It only accurately describes pre-training 2. It ignores the existence of generalization

Next token prediction is just a training task, not "what the model does internally" in any meaningful sense