Hacker News new | ask | show | jobs
by avereveard 623 days ago
well yes but actually no I guess: the transformers benefit at the time was that they were more stable while learning, enabling larger and larger network and dataset to be learnt.