That linked study doesn't particularly resemble Turing's test, though? The authors asked an LLM some questions (like personality tests, or econ games), then reduced the responses to low-dimensional aggregates (like into "Big Five" personality traits), and compared those aggregates against human responses to the same questions. They found those aggregates to be indistinguishable, but that aggregation throws away almost all the information a typical human interrogator would use to judge.
Turing's interrogator also gets to ask whatever questions they think will most effectively distinguish human from machine. Everything those authors asked must appear in the training set countless times (and also corresponds closely to likely RLHF targets), making it a particularly unhelpful choice.
Turing was a WW2 era mathematician. He had no insight or understanding of intelligence, made no study of intelligent systems, and so on (he believed in ESP of all things).
Turing's test is a restatement of a now pseudoscientific behaviourism common at the time; and also, egregiously, places a dumb ape as the system which measures intelligence. If an ape can be fooled, the system is intelligent: people worshiped the sun and thought it conscious. People are desperate to analogise the world to themselves, it is a trivial thing to fool an ape on this matter.
Whatever one might make of this as a philosophical thought experiment, as a test for intelligence, its pseudoscience. What a person might, or might not believe, about a series of words sent across a wire isn't science and it isnt relevant to a discussion about the capabilities of an AI system. It is a measure, only, of how easily deceived we are.
That has nothing to do with why turing proposed it; nor does it have anything to do with general intelligence. This is just pseudoscience.
There's no scientific account of the capacities of a system with intelligence, no account of how these combine, no account of how communicative practices arise, etc. None. Any such attempt would immediately expose the "test" as ridiculous.
General intelligence arises as skillful adaptive control over one's environment, through sensory-motor concept aquistion, and so on.
It has absolutely nothing to do with whether you can emit text tokens in the right order to fool a user about whether the machine is a man or a woman (turing's actual test). Nor does it have anything to do with whether you can fool a person at all.
No machine whose goal is to fool a user about the machine's intelligence has thereby any capacities. Kinda, obviously.
Turing's test not only displays a gross lack of concern to produce any capacities of intelligence in a system; as a research goal, it's actively hostile to the production of any such capacities. Since it is trivial to fool people; this requires no intelligence at all.
> General intelligence arises as skillful adaptive control over one's environment, through sensory-motor concept aquistion, and so on.
This isn't a generally accepted definition or process.
And indeed it seems to preclude people like Stephen Hawkins who had little control over his environment (or to be pedantic, people who had similar conditions from birth).
For the purposes of my criticism of the Turing test, any discussion whatsoever about what capacities ground intelligence is already entertaining what Turing ruled out. He made the extremely pseudoscientific behaviourist assumption that no such science was required, that intelligent agents are just input-output relata on thin I/O boundaries.
Any even plausible scientific account of what capacities ground intelligence would render this view false. Whatever capacities you want to grant, no plausible ones are compatible with Turing's view nor the Turing test.
Consider imagination. You can replace a faculty to imagine with a set of models of ({prompt, reply},) histories for a human observer who is only concerned with those prompts and those replies. But as soon as anything changes in the world, you have to imagine novel things (eg., SpaceX is founded, we visit mars, a new TV show is released...). So questions such as, "what would the latest SpaceGuys TV show be like if Elon handed just launched BlahBlahRocket5 ?" cannot be given fit answers). These require the actual faculty of imagination, along with being in the world and so on.
As soon as you enter a sincere scientific attempt to characterise these features, you see immediately that whilst modelling historical frequencies of human-produced data can fool humans, it cannot impart these capacities.
> So questions such as, "what would the latest SpaceGuys TV show be like if Elon handed just launched BlahBlahRocket5 ?" cannot be given fit answers
I don't understand this at all. ChatGPT can do a great job imagining a world like this right now, and there is no substantial difference in the output of a LLM based "imagination" vs a human based "imagination".
> These require the actual faculty of imagination, along with being in the world and so on.
I think you are implying by this that human's imagination requires a consistent world model and that because LLM's don't really have this they can't be intelligent. Apologies if I have misinterpreted this!
But human imagination isn't consistent at all (as anyone who as edited a fiction story will tell you). Our creative imagination process generates wrong thoughts all the time, and then we self-critic and correct it. It's quite possible for LLMs to do this fine too!
Basically I think my point is that I believe a perfect simulation of intelligence is intelligence, whereas I suspect you don't think it is, maybe?
Yea we don't have any science of intelligence, the only thing we have is empirical data. Testing to see what works. That's why Turing tests are quite fundamental imo.
Turing's interrogator also gets to ask whatever questions they think will most effectively distinguish human from machine. Everything those authors asked must appear in the training set countless times (and also corresponds closely to likely RLHF targets), making it a particularly unhelpful choice.