Hacker News new | ask | show | jobs
by PLenz 99 days ago
This is the real reason the ultra rich are buying media companies. They expect the existing copyright laws to prevail in court and to either make significant revenue licensing IP for training or to take large stakes in AI companies in return for the IP.

Only data is a moat, not algos, not compute.

2 comments

If this happens then free and open content (the Wikipedia model, more or less) becomes a hugely impactful "commoditize the complement" play for the big AI and tech firms. Every good piece of open content is something that AI firms don't have to license from a proprietary supplier. And if models trained on entirely open content can write an acceptable "first draft" of something new, that's huge acceleration.
Seems like a bad bet to me. It looks like authors are going to lose this case setting the precedent that you not only don’t need to license training data, obtaining it illegally (for free) is totally okay.
Didn't Anthropic's case already set the precedent that training itself is fine? It's not like copyrighted novels are a large portion of human-generated text data. It's just the stuff that's easier to get because it's preserved in bulk.

Video transcription has more or less been solved. Imagine how much data Google has in YouTube transcripts. And the longer these AI chat bots operate the more data they manage to collect for training as well (I think Google making it so you can easily upvote or downvote a response by the bot is a good idea).

IIRC the Anthropic case was non-precendent setting for some reason that I don't remember