Hacker News new | ask | show | jobs
by strin 2212 days ago
> “GPT-3″ is just a bigger GPT-2. In other words, it’s a straightforward generalization of the “just make the transformers bigger” approach

Yes it’s true. But there is a difference between what’s interesting and what works. deep learning (RNNs, transformers, etc.) is usually old ideas applied at large scale with slight modifications. Proving a model works well at large scale (175B parameters) is a great contribution and measures our progress towards AI.