Hacker News new | ask | show | jobs
by visarga 2470 days ago
It was trained on 140GB of text on 256 TPUs for 2 weeks, the model being made of 48 transformer layers. I'm wondering when we will see a model trained on 1TB or 10TB of text.
1 comments

I doubt training a scaled up transformer on 10TB of text will lead to significant improvements (btw, 10TB is about the size of all books in English in the Library of Congress). Image classifiers don't get a lot better when trained on a lot more data than ImageNet. 140GB is probably enough to train a general model, which could be finetuned on extra data for specific tasks.

Text generators need a world model and situational awareness, something like a map and a GPS signal. So we are probably two major breakthroughs away from a machine that actually understands something (or at least which seems to understand something, if you're philosophically opposed to the idea that a machine can understand something).