|
|
|
|
|
by ninetyninenine
847 days ago
|
|
No it's not a misunderstanding. Without a concrete definition on a metric comparisons are impossible because everything is based off of wishy washy conjectures on vague and fuzzy concepts. Hard metrics bring in quantitative data. It shows hard differences. Even if the metric is some side marker where in the future is found to have poor correlation or causation with the the thing being measured the hard metric is still valid. Take IQ. We assume iq measures intelligence. But in the future we may determine that no it doesn't measure intelligence well. That doesn't change the fact that iq tests still measured something. The score still says something definitive. My test is similar to the Turing test. But so is yours. In the end there's a human in the loop making a judgment call. |
|
In your final paragraph, you attempt to suggest that my proposed test is no better than the Turing test (and therefore no better than what you are doing), but as you have not addressed the ways in which my proposal differs from the Turing test, I regard this as merely waffling on the issue. In practice, it is not so easy to come up with tests for whether a human understands an issue (as opposed to having merely committed a bunch of related propositions to memory) and I am trying to capture the ways in which we can make that call.
You entered this debate saying "I think we are way past the point of debate here. LLMs are not stochastic parrots. LLMs do understand an aspect of reality", yet your post here ends with "in the end there's a human in the loop making a judgment call", explicitly acknowledging that your strong initial claims are matters of opinion, rather than established facts supported by hard metrics.