|
|
|
|
|
by Lockyy
1070 days ago
|
|
It seems relatively straight forward (famous last words) to assess whether actual copyrighted text is embedded within the network. If you can prompt output that includes verbatim extracts when the copyright avoidance post-processing is disabled then you know that it has been consumed. Of course whether that was purposeful or inadvertently as a part of the larger training set would not be determined but you would know that the text is in there. |
|
If I create a program that picks random words from a dictionary and I end up with a seed that generates that text verbatim, then does that mean my program contains the copyrighted text?
You might be able to craft an intricate prompt that just happens to recreate that copyrighted text. Run it enough times until you get it verbatim and done.