|
|
|
|
|
by water-data-dude
161 days ago
|
|
It'd be difficult to prove that you hadn't leaked information to the model. The big gotcha of LLMs is that you train them on BIG corpuses of data, which means it's hard to say "X isn't in this corpus", or "this corpus only contains Y". You could TRY to assemble a set of training data that only contains text from before a certain date, but it'd be tricky as heck to be SURE about it. Ways data might leak to the model that come to mind: misfiled/mislabled documents, footnotes, annotations, document metadata. |
|