Hacker News new | ask | show | jobs
by agnosticmantis 596 days ago
These papers don’t explain how pertained LLMs learn in-context, because the simplified models in these papers are either pretrained for the same task that’s tested in-context, or the weights are handpicked by humans to do GD at inference time.

See this video for a good discussion: https://youtu.be/-yo2672UikU