| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by john-h-k 431 days ago

> General point: it's impossible to prove anything based on an LLM's response since it's impossible to distinguish a true LLM statement from a false one.

This seems true but sort of vacuous. Obviously an arbitrary statement, much like that as a human, can only be determined "true"/"false" by rigorous first order logic.

But outside of binary T/F, wouldn't "grok says it is Claude 3.5 Sonnet yet other LLMs do not" make you update your chance that grok is actually just Claude 3.5 sonnet?

I wouldn't say I believe it with much conviction. But it seems irrational to not believe it _somewhat more_ after seeing this.

1 comments

gkbrk 431 days ago

> Wouldn't "grok says it is Claude 3.5 Sonnet yet other LLMs do not" make you update your chance that grok is actually just Claude 3.5 sonnet?

Not if you're familiar with Large Language Models.

As an example, "R1 distilled llama" is a model trained by Meta fine-tuned on Deepseek R1 outputs, but if you ask it, it claims to be trained by OpenAI.

link

john-h-k 431 days ago

Right. But given all pairs of mainstream LLM combinations, it seems a model is more likely to say “yes I am X” when it is X than when it isn’t X, even if it still has a high chance of being wrong.

Which means you should (as a bayesian actor) update on it saying “I am X” as evidence it is X

link