While this may be true, the reverse is also true, and even if it’s legal, there are other ways to frame this that are worth considering, e.g. It could technically be legal, but not in accordance with the spirit of the law. Updates to laws are required. The fact that the model is legal is an additional problem on top of the gap in the law.
I think my main point here is that “legal” does not imply moral or acceptable to society, and our understanding of the technical legal status is not a prerequisite for exploring those factors, which may be the thing that changes the legal status in response to the major shift in landscape.
Right but if you have a plausible case you weren't breaking the law and it was a legal unknown the most that will happen is "we've decided this is officially illegal, stop doing it."
You risk nothing by assuming things are legal until explicitly illegal.
If you limit the framing of the conversation to that of an amoral corporate entity, sure. But I don’t think there was ever a question that companies can legally do things that are potentially (or unequivocally) distasteful if not outright unethical/immoral.
More interesting is the broader conversation which involves society’s response to a major shift in the information economy, new questions about what role these tools should play, and how laws should evolve accordingly.
The factors surrounding the emergence/unfolding of AI tooling can’t be stripped down to just the corporate interests involved.
Copyright laws should be amended to allow this scenario. If I read a book and write about it in a blog, it is considered review. Why shouldn’t we allow companies to do the same to train their models? Overall it will benefit society more than it hurts some rich authors.
I think it’s a mistake/fallacy to equate the human acquisition of knowledge and resulting synthesis of value with that of large-scale computers ingesting the sum total of written human knowledge and the outcomes that enables.
They are not similar, and I suspect that if they were (i.e. humans could absorb that much information), the information landscape and the market models for exchanging value would look nothing like they do today, and AI wouldn’t be rocking the boat, it’d just be another adherent to the resulting laws.
That's one thing I'm consistently surprised HN fails to draw a distinction on: copyright regimes are fundamentally about copy rate.
You can't take a regime that works decently with human-rate copying and convert it to computer-rate copying, because fundamentally the give-and-take of rights to each side is balanced against feasible limits of reproduction.
Or, to put it another way, if you can copy/synthesize at most 1 book a day, I can extend you a lot more implicit rights... than I can afford to someone who can copy/synthesize every book ever in a day.
I think the difference is you presumably obtained that book legally before writing the review. In this case the book was pirated (the definitely illegal part), and then used for training (the possibly illegal part, but I suspect this would be deemed fair use).
IMO google and their massive google books DB would have a better leg to stand on here if they trained on that dataset as they owned physical copies of all the books.
>Copyright laws should be amended to allow this scenario. If I read a book and write about it in a blog, it is considered review.
The problem with current AI is that they memorize stuff, there is the case with the AI memorizing an algorithm perfectly, or reciting quotes from Dune and then getting censored.
Now you as a paying user of this AI tools are not making reviews but probably using them for commercial purposes and it would not be fiar if your proprietary code would use code copy pasted from GPL code.
If this AI would be so clever then IMO you could have them laarn say Python exactly like a human, a few books and some exercises on python, some books on algorithms, some books on html or whatever tech. But today they train with the full github and you get a mix of stuff. My suggestion would also improve the sorry state of JS in ChatGPT where it uses super old syntax and still uses outdated pattern like it is coding for IE6. My guess this is because it is train with old or bad code and this mean a=most of the code from now one will be old syntax and bad
Going to go with “no, you don’t need to be rich to sue”. Likewise to be included in a class action you don’t have to pay anything, or even participate any way, you just get a cut of the settlement.
I doubt it makes a difference whether they purchase the ebook or not. And probably a bunch of them aren't even available as ebooks legitimately, people scan books and upload them to zlibrary etc.