| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by waynenilsen 983 days ago

This article is absurd.

> But when a large language model scores well on such tests, it is not clear at all what has been measured. Is it evidence of actual understanding? A mindless statistical trick? Rote repetition?

It is measuring how well it does _at REPLACING HUMANS_. It is hard to believe how the author clearly does not understand this. I don't care how it obtains its results.

GPT-4 is like a hyperspeed entry to mid level dev that has almost no ability to contextualize. Tools built on top of 32k will allow repo ingestion.

This is the worst it will ever be.

4 comments

COAGULOPATH 982 days ago

>It is measuring how well it does _at REPLACING HUMANS_

It's possible to do well on a test and have no ability to do the thing the job tests for.

GPT-4 scores well on an advanced sommelier exam, but obviously cannot replace a human sommelier, because it does not have a mouth.

link

dartos 983 days ago

Which tests test specifically for “replacing humans?” That seems like a wild metric to try and capture in a test.

Also an aside:

> This is the worse it will ever be.

I hear this a lot and it really bothers me. Just because something is the worst it’ll ever be doesn’t mean it’ll get much better. There could always be a plateau on the horizon.

It’s akin to “just have faith.” A real weird sentiment that I didn’t notice in tech before 2021.

link

iudqnolq 982 days ago

GPT passed a test on the theoretical fundamentals of selling and serving wine in fancy restaurants. In a human passing such a test provides a useful signal of job suitability because people who pass it are often also capable of the physical bits, like theatrically opening wine bottles. But obviously that doesn't work for an AI.

Lots of things usefully correlate with test scores in humans but might not in an AI.

link

RandomLensman 983 days ago

It is measuring how well it does replacing humans - in those tests.

link