|
|
|
|
|
by pona-a
472 days ago
|
|
Would that make in-context learning a superset or a subset of pretraining? This paper claimed transformers learn a gradient-descent mesa-optimizer as part of in-context learning, while being guided by the pretraining objective, and as the parent mentioned, any general reasoner can bootstrap a world model from first principles. [0] https://arxiv.org/pdf/2212.07677 |
|
I guess a superset. But it doesn't really matter either way. Ultimately, there's no useful distinction between pretraining and in-context learning. They're just an artifact of the current technology.