| HN Mirror

See my reply to sibling: https://news.ycombinator.com/item?id=34672865

A more interesting example of transformers learning a process may be [1].

There's a large literature on applying language models to reasoning tasks, but not many on what's actually going on inside them. But see for example [2]. Also https://transformer-circuits.pub/ has a body of work on it, but still at a very early stage (see in particular "In-context Learning and Induction Heads").

[1] Extraction of organic chemistry grammar from unsupervised learning of chemical reactions https://www.science.org/doi/10.1126/sciadv.abe4166

[2] Analyzing the Structure of Attention in a Transformer Language Model https://arxiv.org/abs/1906.04284