|
|
|
|
|
by puttycat
384 days ago
|
|
General point: it's impossible to prove anything based on an LLM's response since it's impossible to distinguish a true LLM statement from a false one. There's no way to know whether it outputs Claude because it really is or because it just thinks it's probable given the question. |
|
This seems true but sort of vacuous. Obviously an arbitrary statement, much like that as a human, can only be determined "true"/"false" by rigorous first order logic.
But outside of binary T/F, wouldn't "grok says it is Claude 3.5 Sonnet yet other LLMs do not" make you update your chance that grok is actually just Claude 3.5 sonnet?
I wouldn't say I believe it with much conviction. But it seems irrational to not believe it _somewhat more_ after seeing this.