Just to make it clear, I see only 1 breakthrough [0]. Everything that happened afterwards is just application of this breakthrough with different training sets / to different domains / etc.
Autoregressive language models, the discovery of the Chinchilla scaling law, MoEs, supervised fine-tuning, RLHF, whatever was used to create OpenAI o1, diffusion models, AlphaGo, AlphaFold, AlphaGeometry, AlphaProof.
They are the same breakthrough applied to different domains, I don't see them as different. We will need a new breakthrough, not applying the same solution to new things.
Just to make it clear, I see only 1 breakthrough [0]. Everything that happened afterwards is just application of this breakthrough with different training sets / to different domains / etc.
[0]: https://en.wikipedia.org/wiki/Attention_Is_All_You_Need