Hacker News new | ask | show | jobs
by innagadadavida 1065 days ago
Copyright laws should be amended to allow this scenario. If I read a book and write about it in a blog, it is considered review. Why shouldn’t we allow companies to do the same to train their models? Overall it will benefit society more than it hurts some rich authors.
5 comments

I think it’s a mistake/fallacy to equate the human acquisition of knowledge and resulting synthesis of value with that of large-scale computers ingesting the sum total of written human knowledge and the outcomes that enables.

They are not similar, and I suspect that if they were (i.e. humans could absorb that much information), the information landscape and the market models for exchanging value would look nothing like they do today, and AI wouldn’t be rocking the boat, it’d just be another adherent to the resulting laws.

That's one thing I'm consistently surprised HN fails to draw a distinction on: copyright regimes are fundamentally about copy rate.

You can't take a regime that works decently with human-rate copying and convert it to computer-rate copying, because fundamentally the give-and-take of rights to each side is balanced against feasible limits of reproduction.

Or, to put it another way, if you can copy/synthesize at most 1 book a day, I can extend you a lot more implicit rights... than I can afford to someone who can copy/synthesize every book ever in a day.

I think the difference is you presumably obtained that book legally before writing the review. In this case the book was pirated (the definitely illegal part), and then used for training (the possibly illegal part, but I suspect this would be deemed fair use).

IMO google and their massive google books DB would have a better leg to stand on here if they trained on that dataset as they owned physical copies of all the books.

I don't think it matters. Your review isn't copyright infringement because you pirated the movie.
>Copyright laws should be amended to allow this scenario. If I read a book and write about it in a blog, it is considered review.

The problem with current AI is that they memorize stuff, there is the case with the AI memorizing an algorithm perfectly, or reciting quotes from Dune and then getting censored.

Now you as a paying user of this AI tools are not making reviews but probably using them for commercial purposes and it would not be fiar if your proprietary code would use code copy pasted from GPL code.

If this AI would be so clever then IMO you could have them laarn say Python exactly like a human, a few books and some exercises on python, some books on algorithms, some books on html or whatever tech. But today they train with the full github and you get a mix of stuff. My suggestion would also improve the sorry state of JS in ChatGPT where it uses super old syntax and still uses outdated pattern like it is coding for IE6. My guess this is because it is train with old or bad code and this mean a=most of the code from now one will be old syntax and bad

“Rich authors”.

Citation needed.

I meant the authors that are suing - if you have the money to sue, you can be considered rich? no?
Going to go with “no, you don’t need to be rich to sue”. Likewise to be included in a class action you don’t have to pay anything, or even participate any way, you just get a cut of the settlement.
Couldn’t they just buy the ebook and call it a day? The rich people are the people training LLMs not the authors lol
I doubt it makes a difference whether they purchase the ebook or not. And probably a bunch of them aren't even available as ebooks legitimately, people scan books and upload them to zlibrary etc.