|
|
|
|
|
by kevin42
136 days ago
|
|
I’m genuinely curious how you feel about LLMs being trained on pirated material. Not being snarky here. Your comment reflects the old “information wants to be free” ideals that used to dominate places like HN, Slashdot, and Reddit. But since LLMs arrived, a lot of the loudest voices here argue the opposite position when it comes to training data. I’ve been trying to understand whether people have actually changed their views, or whether it’s mostly a shift in who is speaking up now. |
|
But as a pirate, I specialize in finding hidden, hard to find, or otherwise lost sources. They're not making anybody any money, and I absolutely do not sell anything thats not mine (freely given).
But having every commercial work available for ingestion into an LLM is an amazing way to train an AI. However if you're going to use piracy at scale to train, you should also not be able to sell the LLM or access to it.
And yeah, that wrecks every corporate LLM strategy. Boo fucking hoo.
Do creators need paid for content they create? Ideally, yes! Do they deserve iron-fisted control of your hardware (DRM) to enact their demands? Fuck no!
Ideally, the LLMs would be FLOSS, full weights published, lists of content used to reproduce, etc. We could prune bad content and add more good. But the problem again is whoever does this must violate copyright cause copyright in the way its implemented is terrible.
In reality, I like the RIAA's congressional solution. You send a check for how many plays you did to BMI/ASCAP and you're good. That could be extended to books and shows. If that were done, you could have a New-Flix service that literally has every show and movie in existence. You just pay a reasonable cost per month to access the whole of video humanity.
Alas. Guess I'll have to build it myself.