Just to make it clear, I see only 1 breakthrough [0]. Everything that happened afterwards is just application of this breakthrough with different training sets / to different domains / etc.
Autoregressive language models, the discovery of the Chinchilla scaling law, MoEs, supervised fine-tuning, RLHF, whatever was used to create OpenAI o1, diffusion models, AlphaGo, AlphaFold, AlphaGeometry, AlphaProof.