Hacker News new | ask | show | jobs
by ninetyninenine 845 days ago
>I'm still a bit puzzled here, because it seems to me that the paragraph continuing from here is making the argument that LLM performance on these tests doesn't matter, as far as the question is concerned: in this paragraph you seem to be saying (paraphrased) that despite LLMs' impressive performance on these quantitative tests, they could still fail Turing tests, so their performance on these quantitative tests is not decisive.

It matters in the quantitative sense. It measures AI performance. What it won't do is matter to YOU. Because you're a human and humans will keep moving the bar to a higher standard right? When AI shot passed the turing test humans just moved the goal posts. So to convince someone like YOU we have to look at the final metric. The point where LLM I/O becomes indistinguishable/superior to humans. Of course you look at the last decade... AI is rapidly approaching that final bar.

>The impression I get from what you have written in this post is that you are not claiming that a test conforming to your requirements has actually been successfully performed, you are just assuming it could be?

Whether I assume or don't assume, the projection of the trendline currently indicates that it will. Given the trendline that is the most probable conclusion.

>The experiment does not explicitly address it.

Nothing on the face of the earth can address the question. Because nobody truly knows what "understanding" something actually is. You can't even articulate the definition in a formal way such that it can be dictated on a computer program.

So I went to the next best possibility, which is my point. The point is ALTHOUGH we don't know what understanding is, we ALL assume humans understand things. So we set that as a bar metric. Anything indistinguishable from a human must understand things. Anything that appears close to a human but is not quite human must understand things ALMOST as well as a human.

1 comments

> What it won't do is matter to YOU. Because you're a human and humans will keep moving the bar to a higher standard right? When AI shot passed the turing test humans just moved the goal posts. So to convince someone like YOU we have to look at the final metric.

It is disappointing to see you descending into something of a rant here. If you knew me better, you would know that I spend more time debating in opposition to people who think they can prove that AGI/artificial consciousness is impossible than I do with people who think it is already an undeniable fact that it has already been achieved (though this discussion is shifting the balance towards the middle, if only briefly.) Just because I approach arguments in either direction with a degree of skepticism and I don't see any value in trying to call the arrival of true AGI at the very first moment it occurs, it does not mean that I'm trying (whether secretly or openly) to deny that it is possible either in the near-term or at all. FWIW, I regard the former as possible and the latter highly probable, so long as we don't self-destruct first.

> Nothing on the face of the earth can address the question. Because nobody truly knows what "understanding" something actually is. You can't even articulate the definition in a formal way such that it can be dictated on a computer program.

The anti-AI folk I mentioned above would willingly embrace this position! They would say that it shows that human-like intelligence and consciousness lies outside of the scope of the physical sciences, and that this creates the possibility of a type of p-zombie that is indistinguishable by physical science from a human and yet lacks any concept of itself as an entity within an external world.

More relevantly, your response here repeats an earlier fallacy. In practice, concepts and their definitions are revised, tightened, remixed and refined as we inquire into them and gain knowledge. I know you don't agree, but as this is not an opinion but an empirical observation, validated by many cases in the history of science and science-like disciplines, I don't see you prevailing here - and there's the knowledge-bootstrap problem if this were not the case, as well.

It occurred to me this morning that there's a variant or extension of the quantitative Turing test which goes like this:

We have two agents and a judge. The judge is a human and the agents are either a pair of humans, a pair of AIs, or one of each, chosen randomly and without the judge being unaware of the mix. One of the agents is picked, by random choice, to start a discussion with the other with the intent of exploring what the other understands about some topic, with the discussion-starter being given the freedom to choose the topic. The discussion proceeds for a reasonable length of time - let's say one hour.

The judge follows the discussion but does not participate in it. At the conclusion of the discussion, the judge is required to say, for each agent, whether it is more likely that it is a human or AI, and the accuracy of this call is used to assign a categorical variable to the result, just as in the version of the Turing test you have described.

This seems just as quantitative, and in the same way, as your version, yet there's no reason to believe it will necessarily yield the same results. More tests are better, so what's not to like?

>It is disappointing to see you descending into something of a rant here.

I'm going to be frank with you. I'm not ranting and uncharitable comments like this aren't appreciated. I'm going to respond to your reply later in another post, but if I see more stuff like this I'll stop stop communicating with you. Please don't say stuff like that.

I could have, equally reasonably, made exactly the same response to your post. I will do my best to respond civilly (I admit that I have some failings in this regard), but I also suggest that whenever you feel the urge to capitalize the word "you", you give it a second thought.
Apologies, by YOU I mean YOU as a human, not YOU as an individual. Like we all generally feel that the quantitative tests aren't enough. The capitalization was for emphasis for you to look at yourself and know that you're human and likely feel the same thing. Most people would say the stuff like IQ tests aren't enough and we can't pinpoint definitively why, as humans, WE (keyword change) just feel that way.

That feeling is what sets the bar. There's no rhyme or reason behind it. But humans are the one who make the judgement call so that's what it has to be.

I will respond more later when I have time.

No problem! I anticipate the rest of your response with the expectation that it will be challenging and thought-provoking.
For your test I don't see it offering anything new. I see it as the same as my test but just extra complexities. From a statistical point of view I feel it will yield roughly the same results as my test. As long as the judge outputs a binary true or false on whether the entities are humans or ais.

Yes I did say we can't define understanding. But despite the fact that we can't define it we still counter intuitively "know" when something has the capability of understanding. We say all humans have the capability of understanding.

This is the point. The word is undefined yet we can still apply the word and use the word and "know" whether something can understand things.

Thus we classify humans as capable of understanding things without any rhyme or reason. This is fine. But if you take this logic further, that means anything that is indistinguishable from a human must fit into this category.

That was my point. This is the logical limit of how far we can go with an undefined word. To be consistent with our logical application of the word "understanding" we must apply to AI if AI is indistinguishable from humans. If we don't do this then our reasoning is inconsistent. All of this can be done without even having a definition of the word "understanding"