Hacker News new | ask | show | jobs
by soraki_soladead 1219 days ago
Only some model architectures continue to get better as you pump in more data. Transformers and their variants have this property more so than prior architectures.