|
|
|
|
|
by georgehotz
25 days ago
|
|
Author here. I have never said that phrase before this blog post and certainly understand the absurdity of it. I certainly don't mean that you need something biological or whatever consciousness might or might not be. However there's still a distinction. Unless I'm responding to an LLM, you had a childhood. You learned about the world and space and agency before you ever learned how to program. And you didn't learn it from billions of examples, you learned from a few examples, some self directed experiments, some feedback from teachers, etc... I'm saying that's what matters. The process matters. You didn't learn to mimic a distribution, you learned to program. Of course in the perfect mathematical limit it's the same, but in practice it's not. |
|
1. It only accurately describes pre-training 2. It ignores the existence of generalization
Next token prediction is just a training task, not "what the model does internally" in any meaningful sense