A more interesting example of transformers learning a process may be [1].
There's a large literature on applying language models to reasoning tasks, but not many on what's actually going on inside them. But see for example [2]. Also https://transformer-circuits.pub/ has a body of work on it, but still at a very early stage (see in particular "In-context Learning and Induction Heads").
A more interesting example of transformers learning a process may be [1].
There's a large literature on applying language models to reasoning tasks, but not many on what's actually going on inside them. But see for example [2]. Also https://transformer-circuits.pub/ has a body of work on it, but still at a very early stage (see in particular "In-context Learning and Induction Heads").
[1] Extraction of organic chemistry grammar from unsupervised learning of chemical reactions https://www.science.org/doi/10.1126/sciadv.abe4166
[2] Analyzing the Structure of Attention in a Transformer Language Model https://arxiv.org/abs/1906.04284