| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mike741 1173 days ago
	If we're including tests that don't exist yet, then sure those are going to be difficult to pass. As far as actual ToM tests go though, such as the one being discussed in this thread, they can be easily passed by trivial hardcoding and say nothing of the program's internal reasoning.

1 comments

lucubratory 1167 days ago

LLMs can pass novel theory of mind tests, which is what we're talking about. The whole point of good tests is you invent or withhold new ones it has never seen before and test it on them. You said "Those tasks could be completed a [sic] traditional static program.", and no, they can't. You're incorrect.

link

mike741 1165 days ago

> LLMs can pass novel theory of mind tests, which is what we're talking about.

Passing a ToM test is not what OP meant by having an "underlying theory of mind." OP's talking about the machine having an underlying mind (ie sentience, sapience, consciousness, etc), ToM tests are only testing output.

> You said "Those tasks could be completed a [sic] traditional static program.", and no, they can't. You're incorrect.

They can, a static program as I described would indeed answer that one question correctly, resulting in a positive ToM score, without seeing any training data whatsoever. Did the programmer see it? Maybe, but the machine didn't and it would pass the test regardless.

link

lucubratory 1163 days ago

If you have put the answer into the program then by definition you had the test available to you when you finalised the program, which means it is definitionally not a novel test.

link

mike741 1161 days ago

The test is novel to the program, just not its programmer. So are we testing the program or are we actually testing its programmer? If we're testing the program, then the programmer's foreknowledge is irrelevant.

link

lucubratory 1160 days ago

>The test is novel to the program

That's funny, I thought you said the test's answer was embedded into the program, making it definitionally not novel to the program.

Anyway, this is boring. You've had five or more opportunities to understand what the word "novel" means in an ML testing context and are choosing wilful obtuseness instead.

link

mike741 1151 days ago

> in an ML testing context

OP was not speaking in the ML testing context, hence the misunderstanding.

link