Hacker News new | ask | show | jobs
by StrLght 636 days ago
Which ones are you referring to?

Just to make it clear, I see only 1 breakthrough [0]. Everything that happened afterwards is just application of this breakthrough with different training sets / to different domains / etc.

[0]: https://en.wikipedia.org/wiki/Attention_Is_All_You_Need

1 comments

Autoregressive language models, the discovery of the Chinchilla scaling law, MoEs, supervised fine-tuning, RLHF, whatever was used to create OpenAI o1, diffusion models, AlphaGo, AlphaFold, AlphaGeometry, AlphaProof.