Hacker News new | ask | show | jobs
by HarHarVeryFunny 782 days ago
> That's why it was already evident back then that Google, Facebook, etc were sitting on a goldmine because in the long run those with the most data, not the brightest PhDs will win this race.

So far it doesn't seem to be panning out that way though. Companies such as OpenAI, Anthropic and Reka don't have any special internal sources of data, yet all have trained SOTA models.

Probably the main reason for this is that data type/quality matters more than quantity, which is why most of these companies are now using self-generated synthetic data.

The companies/institutes that will have a data advantage are those that have private datasets consisting of a different type (or maybe higher quality?) of data than publicly available, but this seems more likely to be in specialized domains (medical, etc), rather than what is useful for general intelligence.

I assume that, longer term, we'll have better AI architectures capable of realtime learning, and then the focus may switch on-the-job training and learning ability, rather than data.