|
|
|
|
|
by mannykannot
846 days ago
|
|
I had no intention of implying any malfeasance in my use of the word "odd"; I mean it in the sense of unusual, unexpected and surprising. The thing is, you finishished your precursor post saying, about your tests and mine, that it comes down to there being a human in the loop making a judgement call, but in a follow-on you say that there are thousands of quantitative metrics. Why, I wondered, would that matter, if it comes down to a human making a judgement call? Were you switching to a different line of argument, one that (as far as I could tell) had not been raised before? That's what I found surprising about your claim. I am still rather confused about how this fits into what you are saying more generally. At first I thought you were saying, in your latest post, that the Turing-test interrogator should be restricted to asking questions from the sets having quantitative metrics in order for it to be an objective process, but that doesn't really hold up, as far as I can see. Frankly, I suspect that the tests with objective metrics are beside the point, and the essence of your position is contained within your final paragraph: "If the LLM can deceive the human [then] that is a hard True/False quantitative metric [and the only sort we can get]." If so, then (no surprise) I think there are some problems with it, but before I go further, I would like to check that I understand your position. |
|
It matters because of humans. If I gave an LLM thousands of quantitative tests and it passed them all but in an hour long conversation a human could identify it was an LLM through some flaw the human would consider all those tests useless. That's why it matters. The human making a judgement call is still a quantitative measurement btw as you can limit human output to True or False. But because every human is different in order to get good numbers you have to do measurements with multitudes of humans.
>I am still rather confused about how this fits into what you are saying more generally. At first I thought you were saying, in your latest post, that the Turing-test interrogator should be restricted to asking questions from the sets having quantitative metrics in order for it to be an objective process, but that doesn't really hold up, as far as I can see.
it can still be objective with a human in the loop assuming the human is honest. What's not objective is a human offering an opinion in the form of a paragraph with no definitive clarity on what constitutes a metric. I realize that elements of MY metric have indeterminism to it, but it is still a hard metric because the output is over a well defined set. Whenever you have indeterminism you would then turn to probability and many samples in order to produce a final quantitative result.
>If so, then (no surprise) I think there are some problems with it, but before I go further, I would like to check that I understand your position.
yes my position is that exactly. If all observable qualities indicate it's a duck, then there's nothing more you can determine beyond that, scientifically speaking. You're implying there is a better way?