Hacker News new | ask | show | jobs
by VikingCoder 2435 days ago
> 2) T5 was trained with ~750GB of texts or ~150 billion words, which is > 100 times the number of words native English speakers acquire by the age of 20.

...but, humans evolved the ability to use language over hundreds of generations... So... Maybe that's not such a bad thing?

2 comments

Indeed this is important to realize: Training such a generic model from scratch does not only reiterate learning, but the entire evolutionary process that led to the emergence of neural circuits actually capable of such learning. That perspective makes many of the current achievements -- error-prone as they might be -- even more impressive!
The amount of data required may not be a decisive factor but rather a canary in the coal mine that something is off.

If we wish to use a model in critical situations, such as a medical setting or commanding a self-driving car, 1) and 4) above cannot be ignored.