Hacker News new | ask | show | jobs
by twayt 1071 days ago
Libgen / Scihub or not, if the model can provide details about the book other than just high level info like the summary and no explicit deal with the publisher has been made, you can make a strong argument that it is plagiarism.

Even if bits and pieces of the book text are distributed across the internet and you end up picking up portions of the book, you still read the book.

It is extremely sad but ChatGPT will be taken down by the end of this year and replaced by a highly neutered model next year.

2 comments

I'm not a lawyer and obviously we won't get any definite answer unless it actually goes to court, all of this is just hand waving and guessing.

But I think that unless GPT starts reciting large parts outside of the context of learning/education/research, reciting smaller snippets would fall into "fair use" and not be illegal.

For it to be fair use, they still have to have legally owned the book (as far as I understand).

You can't steal a book, photocopy some pages, then claim the photocopied pages are fair use.

I think you can. It is a separate "crime". You would get 2 cases one for fair use (which if you are quoting, commenting, reviewing, generally repurposing content and it is in fact fair) and second case for license/terms breach and/or illegally obtaining this piece of work(for example if you stolen it from bookstore).
If you recite enough small snippets, you make a large one.

Especially with ChatGPT you can probe the model by asking certain questions about the material at hand to see if it has seen the entire book.

Also you don’t have to be able to recite the book verbatim for it to have been in your training set. The snippets I am referring to are on the side of the training data

If I read a book and then write a summary, is that plagiarism? What's the difference? I am legitimately not familiar with copyright law, but real lawyers seem to think it is unclear whether training on copyrighted data is illegal (in Japan it's definitely not).