|
|
|
|
|
by aspenmartin
24 days ago
|
|
Well that historical content and code still exists right? Are you just saying “what if we’re in a world of walled gardens now that OSS dies because people don’t want their work stolen” in which case: these companies will get data and they don’t need OSS anymore. It’s already webcrawled or licensed or commissioned, they pay people to generate novel traces when they need it or at the very least sets of prompts and tests for verification. Then synthetic data gets added to the training set, the ones that are verified. |
|
Do you think creating the orders of magnitude of content the internet produced organically and which LLM creators are stealing is cheap? If they actually have to pay for content creation while competing with content creators on the you know, content creation front via LLM-generation, the entire business model of LLMs collapses.
You can't have the mountains of data needed for LLMs in the decades to come, if your LLMs put the writers and artists out of work.