|
|
|
|
|
by wyager
20 days ago
|
|
This description falls apart for two reasons 1. It only accurately describes pre-training
2. It ignores the existence of generalization Next token prediction is just a training task, not "what the model does internally" in any meaningful sense |
|