|
|
|
|
|
by martin-t
292 days ago
|
|
1) They absolutely do sometimes repeat training data verbatim.[0] 2) That's not even the point. The point is being trained on stolen data without permission, pretending that the resulting model of the training data is not a derived work of the training data and that the output of the model plus a prompt is not derived work of the training data. Point 1 is just an extreme edge case which is a symptom of point 2 and yet people still have trouble accepting it. GPL was about user freedom and now if derived work no longer applies as long as you run code through a sufficiently complex plagiarism automator, plagiarism is unprovable and GPL is broken. Great, we lost another freedom. [0]: I recall a study or court document with 100 examples of plagiarising multiple whole paragraphs from the New York Times, don't have time to look for it now |
|
Convenient. Well then, I recall two studies that said the opposite. Unfortunately pressed for time as well.