Hacker News new | ask | show | jobs
by ben_w 1091 days ago
It doesn't require humans to work for free — while that's been a common default MO since everyone looked at Google making a search index and thinking to themselves "if they're doing it surely do can we", there are data sets made by paying people.
1 comments

There are such datasets, and AI companies absolutely pay to have data curated. But I suspect it would be just unimaginably expensive to create a dataset from scratch with enough tokens to feed a model with hundreds of billions of parameters, all the while paying every participant fairly.
"fair" is somewhat undefined, as the fair-looking number for being paid for effort can be very different to the fair-looking number for being paid for the resale value of the end product on an open market.

I wonder what would an LLM trained on Google code and internal documents look like?