Hacker News new | ask | show | jobs
by gruez 497 days ago
>One of the big problems is that training is a mechanical process, so there is a direct line between the copyrighted works and the model's output, regardless of the form of the output. Just on those terms it is very likely to be a copyright violation. Even if they don't reproduce substantive portions, what they do reproduce is a derived work.

Google making thumbnails or scanning books are both arguably "mechanical". Both have been ruled as fair use.