|
|
|
|
|
by eternalban
1131 days ago
|
|
> What exactly makes anyone think that they can detect an LLM that is outputting text? The notion seems absurd yet it keeps coming up. My sense of the general idea (non-authorative): Since the sequence emitted by an LLM is probabilistic completion i.e. predict the next word, the examiner can also do the same by progressively processing the text. Given the assumption that the semantic relations extracted from training corpus should be fairly universal for a given domain at the output level (even though distinct LLMs will likely have distinct embedding spaces), then the examiner LLM should be able to assign probabilities to the predicted words. The idea is that a genuine human produced text will have idiosyncrasies that are -not- probabilistically optimal and the examiner can establish a sort of 'distant from probable mean' measure, with the expectation that LLM produced text should be 'closer' to the examiner's predictions of 'the next word'. The problem (if above is correct) then is the missing 'prompt' and meta-instruction embedded therein. Those should ("engineering") affect the output, possibly skewing the distance measure, thus defeating the examiner. But of course, say in context of academia, the examiner can 'guess' as to some aspects of the prompt as well. For example, if you are examining papers for a specific assignment, the examiner can self-prompt as well. "An essay on Hume's position on the knowledge of the self". |
|