Hacker News new | ask | show | jobs
by mrtranscendence 1093 days ago
There are such datasets, and AI companies absolutely pay to have data curated. But I suspect it would be just unimaginably expensive to create a dataset from scratch with enough tokens to feed a model with hundreds of billions of parameters, all the while paying every participant fairly.
1 comments

"fair" is somewhat undefined, as the fair-looking number for being paid for effort can be very different to the fair-looking number for being paid for the resale value of the end product on an open market.

I wonder what would an LLM trained on Google code and internal documents look like?