Hacker News new | ask | show | jobs
by svara 879 days ago
Since you say you're knowledgeable on this, here's a question: If you have access to the model, wouldn't it be possible to inspect the sequence of token probabilities for a piece of text and derive from this a probability that the text was produced by that model at a given temperature? It would seem intuitive that the exact token probabilities are model specific and can be used to identify a model from its output given enough data.

I suppose an issue with this might be that an unknown prompt would add a lot of "hidden" information, but you could probably start from a guess or multiple guesses at the prompt.

1 comments

That's pretty much how most of these methods work. It just doesn't work very well because good models have a reasonable probability of generating lots of different texts. So you don't get very different numbers on AI and Human generated texts. After all the models are trained to learn the probability distribution of exactly Human text.