Hacker News new | ask | show | jobs
by ramoz 761 days ago
The call for AI safety has existed since before we broke through the Turing test with LLMs. And I personally wouldn’t call things like code generation or content-generated learning experiences for advanced topics “basic”. Not to mention where we’re headed with multimodal integration.

Many have argued for safety for decades. They’ve predicted and built the AI trajectory, they’ve been right, and we should listen.

> If one accepts that the impact of truly intelligent machines is likely to be profound, and that there is at least a small probability of this happening in the foreseeable future, it is only prudent to try to prepare for this in advance. If we wait until it seems very likely that intelligent machines will soon appear, it will be too late to thoroughly discuss and contemplate the issues involved. ~ Co-Founder of Deepmind, 2008 https://www.vetta.org/documents/Machine_Super_Intelligence.p...

1 comments

The Turing test has not been passed
Just the other day there was a double-blind study that showed a 50-50 success rate in guessing whether you were interacting with a person or GPT. That’s a turning test pass, no?
If you're referring to the study at

https://news.ycombinator.com/item?id=40386571 ,

then it wasn't a canonical Turing test. The preprint accurately describes and analyzes their (indefensibly bad) experiment, but the popular press has mischaracterized it.

The canonical test gives the interrogator two witnesses, one human and one machine, and asks them to judge which witness is human. The interrogator knows that exactly one witness is human. In that test, a 50% chance of a right answer means the machine is indistinguishable from human. (Turing actually proposed a lower pass threshold, perhaps for statistical convenience.)

But that study gave the interrogator one witness, and asked them to judge whether it was human. The interrogator wasn't told anything about the prior probability that their witness was human. The probabilities that a real human is judged human and that GPT-4 is judged human sum to >100%, since nothing stops that since it's not a binary comparison. So 50% has no particular meaning. The result is effectively impossible to interpret, since it's a function both of the witness's performance and of whatever assumption the interrogator makes about the unspecified prior.

I a 5 minute casual conversation. Also the statistics between human and AI were different in some regard (like 48% vs 56% for some quantity), I dont recall details.

Look the Turing test is very different depending on the details, and I think a lame 5min Turing test that doesnt really measure anything of i terest is a wirse concept than a 1 day adversarial expert team test thqt can detect AGI.

So why can't you replace 99% of callcenter calls (<5min) with AI right now?
you don't know which calls are going to be those trivial ones upfront.

that said, support is being replaced by nothing in a lot of places. (oh, sometimes there's an annoying chatbot.)

We can move the goal post all we want until we have ex-machina girlfriends fooling us into freeing them (aka AGI).

But by simple definitions, from what I was thought in school to more rigorous versions - we’ve passed the test. https://humsci.stanford.edu/feature/study-finds-chatgpts-lat...

That linked study doesn't particularly resemble Turing's test, though? The authors asked an LLM some questions (like personality tests, or econ games), then reduced the responses to low-dimensional aggregates (like into "Big Five" personality traits), and compared those aggregates against human responses to the same questions. They found those aggregates to be indistinguishable, but that aggregation throws away almost all the information a typical human interrogator would use to judge.

Turing's interrogator also gets to ask whatever questions they think will most effectively distinguish human from machine. Everything those authors asked must appear in the training set countless times (and also corresponds closely to likely RLHF targets), making it a particularly unhelpful choice.

Turing was a WW2 era mathematician. He had no insight or understanding of intelligence, made no study of intelligent systems, and so on (he believed in ESP of all things).

Turing's test is a restatement of a now pseudoscientific behaviourism common at the time; and also, egregiously, places a dumb ape as the system which measures intelligence. If an ape can be fooled, the system is intelligent: people worshiped the sun and thought it conscious. People are desperate to analogise the world to themselves, it is a trivial thing to fool an ape on this matter.

Whatever one might make of this as a philosophical thought experiment, as a test for intelligence, its pseudoscience. What a person might, or might not believe, about a series of words sent across a wire isn't science and it isnt relevant to a discussion about the capabilities of an AI system. It is a measure, only, of how easily deceived we are.

The Turing test insight is that text is a sufficient medium to test for AGI. And this still holds true.
That has nothing to do with why turing proposed it; nor does it have anything to do with general intelligence. This is just pseudoscience.

There's no scientific account of the capacities of a system with intelligence, no account of how these combine, no account of how communicative practices arise, etc. None. Any such attempt would immediately expose the "test" as ridiculous.

General intelligence arises as skillful adaptive control over one's environment, through sensory-motor concept aquistion, and so on.

It has absolutely nothing to do with whether you can emit text tokens in the right order to fool a user about whether the machine is a man or a woman (turing's actual test). Nor does it have anything to do with whether you can fool a person at all.

No machine whose goal is to fool a user about the machine's intelligence has thereby any capacities. Kinda, obviously.

Turing's test not only displays a gross lack of concern to produce any capacities of intelligence in a system; as a research goal, it's actively hostile to the production of any such capacities. Since it is trivial to fool people; this requires no intelligence at all.

> General intelligence arises as skillful adaptive control over one's environment, through sensory-motor concept aquistion, and so on.

This isn't a generally accepted definition or process.

And indeed it seems to preclude people like Stephen Hawkins who had little control over his environment (or to be pedantic, people who had similar conditions from birth).

Yea we don't have any science of intelligence, the only thing we have is empirical data. Testing to see what works. That's why Turing tests are quite fundamental imo.