| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pona-a 472 days ago

Would that make in-context learning a superset or a subset of pretraining?

This paper claimed transformers learn a gradient-descent mesa-optimizer as part of in-context learning, while being guided by the pretraining objective, and as the parent mentioned, any general reasoner can bootstrap a world model from first principles.

[0] https://arxiv.org/pdf/2212.07677

1 comments

ta8645 472 days ago

> Would that make in-context learning a superset or a subset of pretraining?

I guess a superset. But it doesn't really matter either way. Ultimately, there's no useful distinction between pretraining and in-context learning. They're just an artifact of the current technology.

link