Hacker News new | ask | show | jobs
by john-h-k 384 days ago
> General point: it's impossible to prove anything based on an LLM's response since it's impossible to distinguish a true LLM statement from a false one.

This seems true but sort of vacuous. Obviously an arbitrary statement, much like that as a human, can only be determined "true"/"false" by rigorous first order logic.

But outside of binary T/F, wouldn't "grok says it is Claude 3.5 Sonnet yet other LLMs do not" make you update your chance that grok is actually just Claude 3.5 sonnet?

I wouldn't say I believe it with much conviction. But it seems irrational to not believe it _somewhat more_ after seeing this.

1 comments

> Wouldn't "grok says it is Claude 3.5 Sonnet yet other LLMs do not" make you update your chance that grok is actually just Claude 3.5 sonnet?

Not if you're familiar with Large Language Models.

As an example, "R1 distilled llama" is a model trained by Meta fine-tuned on Deepseek R1 outputs, but if you ask it, it claims to be trained by OpenAI.

Right. But given all pairs of mainstream LLM combinations, it seems a model is more likely to say “yes I am X” when it is X than when it isn’t X, even if it still has a high chance of being wrong.

Which means you should (as a bayesian actor) update on it saying “I am X” as evidence it is X