Hacker News new | ask | show | jobs
by waynesonfire 1227 days ago
When the entire work is synthesized for the purpose of training a model it's not "limited portions"

It makes sense there is no limit on the number of words that can be used under fair use, but it's certainly less than all of them.

1 comments

Context matters. You are taking one FAQ question and generalizing. It is permissible to reproduce or show complete works in education, depending on the context. It is also completely normal to read copyrighted work in private, learn something from it, and then answer questions about that work publicly.

The questions around LLMs learning from copyrighted material are still open and need to be settled in court. I personally imagine finding infringement would impose more harm on society and progress than letting the models acquire knowledge from these copyrighted works.

> letting the models acquire knowledge from these copyrighted works

I'm gagging at the nonsensical anthropomorphizing being done to end-run the fact that what the LLMs are doing is copying.

You make a lot of condescending or toxic remarks on HN. You might want to consider how that affects your ability to sway others with your comments.

Please chill

> the fact that what the LLMs are doing is copying.

I disagree, the training process creates token representations and weighted connections between them. The models later produce probabilistic token sequences, not so unlike what our meat bodies do, though by very different mechanisms. The fact that certain sequences can be reproduced verbatim is likely a consequence of overfitting. They certainly cannot reproduce all training data verbatim. It would be interesting to know the features around what can and cannot be, and how.

> The models later produce probabilistic token sequences, not so unlike what our meat bodies do, though by very different mechanisms.

Your response to me calling out your baseless anthropomorphizing was to double down on it? It's amusing to me that you don't think you are condescending.

It seems you don't know what anthropomorphizing actually means based on your over application of the word.

ChatGPT can help you with that ;]