Hacker News new | ask | show | jobs
by eirikbakke 281 days ago
Humans require a _lot_ less training data to become, for instance, fluent in English. If a given AI algorithm needs to be trained on the entire Internet to accomplish the same, then it seems safe to assume that the data has not really been "mined out".

Generating more training data from the same original data should not be fundamentally problematic in that sense.

2 comments

It only seems that way because much of the data that humans use is not in a format that computers would understand. A toddler learning to talk is engaging their full body.
humans also have billions of years of evolution and trillions of organisms to develop a receptacle biased towards learning language
Billions of years of evolution, but still limited to the data that is replicated in human genome/DNA, which is about 3 gigabytes (+epigenome).