Seems like a bad bet to me. It looks like authors are going to lose this case setting the precedent that you not only don’t need to license training data, obtaining it illegally (for free) is totally okay.
Didn't Anthropic's case already set the precedent that training itself is fine? It's not like copyrighted novels are a large portion of human-generated text data. It's just the stuff that's easier to get because it's preserved in bulk.
Video transcription has more or less been solved. Imagine how much data Google has in YouTube transcripts. And the longer these AI chat bots operate the more data they manage to collect for training as well (I think Google making it so you can easily upvote or downvote a response by the bot is a good idea).
Video transcription has more or less been solved. Imagine how much data Google has in YouTube transcripts. And the longer these AI chat bots operate the more data they manage to collect for training as well (I think Google making it so you can easily upvote or downvote a response by the bot is a good idea).