|
|
|
|
|
by john-h-k
384 days ago
|
|
> General point: it's impossible to prove anything based on an LLM's response since it's impossible to distinguish a true LLM statement from a false one. This seems true but sort of vacuous. Obviously an arbitrary statement, much like that as a human, can only be determined "true"/"false" by rigorous first order logic. But outside of binary T/F, wouldn't "grok says it is Claude 3.5 Sonnet yet other LLMs do not" make you update your chance that grok is actually just Claude 3.5 sonnet? I wouldn't say I believe it with much conviction. But it seems irrational to not believe it _somewhat more_ after seeing this. |
|
Not if you're familiar with Large Language Models.
As an example, "R1 distilled llama" is a model trained by Meta fine-tuned on Deepseek R1 outputs, but if you ask it, it claims to be trained by OpenAI.