Hacker News new | ask | show | jobs
by constantcrying 1070 days ago
> you can prompt output that includes verbatim extracts when the copyright avoidance post-processing is disabled then you know that it has been consumed.

No, you know that likely that part was consumed. You would need to show that it will generate arbitrary passages from the text.

And LLMs are inherently random, so proof that this happens is very difficult to obtain and showing that it is actual output nearly impossible, especially if you just have API access and can't use the model directoy (e.g. fix the RNG seed).

If you have that you can debate if it is/isn't fair use.

1 comments

Arbitrary passages is what I meant by "verbatim extracts."