Hacker News new | ask | show | jobs
by stcredzero 888 days ago
Open Source and Free Software wasn't formulated to deal with the need for this level of gargantuan amounts of data and compute.

Can the public compete? What percentage of the technical public could we expect to participate, and how much data, compute, and data quality improvement could they bring to the table? I suspect that large corporations are at least an order of magnitude advantaged economically.

3 comments

There is a big effort being worked on in China, Yuanqing Lin gave an interview on the deep learning course that works on this magnitude [1]. They suggest that they will host both the resources to store the data, train the data, and have all those algorithms available in China.

[1] https://www.youtube.com/watch?v=3GfOnI3goAk

The public doesn't have the resources to train the largest state-of-the-art LLMs, but training useful LLMs seems doable. Maybe not for most individuals but certainly for a range of nonprofits, research teams and companies.
Isn't is relatively easy for a smaller model to poke holes in the output of a larger model?
But not nearly as in reach as modifying open source models.
Open Source and Free Software are not about the amount of data.