Hacker News new | ask | show | jobs
by pona-a 472 days ago
Would that make in-context learning a superset or a subset of pretraining?

This paper claimed transformers learn a gradient-descent mesa-optimizer as part of in-context learning, while being guided by the pretraining objective, and as the parent mentioned, any general reasoner can bootstrap a world model from first principles.

[0] https://arxiv.org/pdf/2212.07677

1 comments

> Would that make in-context learning a superset or a subset of pretraining?

I guess a superset. But it doesn't really matter either way. Ultimately, there's no useful distinction between pretraining and in-context learning. They're just an artifact of the current technology.