| HN Mirror

I doubt training a scaled up transformer on 10TB of text will lead to significant improvements (btw, 10TB is about the size of all books in English in the Library of Congress). Image classifiers don't get a lot better when trained on a lot more data than ImageNet. 140GB is probably enough to train a general model, which could be finetuned on extra data for specific tasks.

Text generators need a world model and situational awareness, something like a map and a GPS signal. So we are probably two major breakthroughs away from a machine that actually understands something (or at least which seems to understand something, if you're philosophically opposed to the idea that a machine can understand something).