Hacker News new | ask | show | jobs
by waynenilsen 983 days ago
This article is absurd.

> But when a large language model scores well on such tests, it is not clear at all what has been measured. Is it evidence of actual understanding? A mindless statistical trick? Rote repetition?

It is measuring how well it does _at REPLACING HUMANS_. It is hard to believe how the author clearly does not understand this. I don't care how it obtains its results.

GPT-4 is like a hyperspeed entry to mid level dev that has almost no ability to contextualize. Tools built on top of 32k will allow repo ingestion.

This is the worst it will ever be.

4 comments

>It is measuring how well it does _at REPLACING HUMANS_

It's possible to do well on a test and have no ability to do the thing the job tests for.

GPT-4 scores well on an advanced sommelier exam, but obviously cannot replace a human sommelier, because it does not have a mouth.

Which tests test specifically for “replacing humans?” That seems like a wild metric to try and capture in a test.

Also an aside:

> This is the worse it will ever be.

I hear this a lot and it really bothers me. Just because something is the worst it’ll ever be doesn’t mean it’ll get much better. There could always be a plateau on the horizon.

It’s akin to “just have faith.” A real weird sentiment that I didn’t notice in tech before 2021.

GPT passed a test on the theoretical fundamentals of selling and serving wine in fancy restaurants. In a human passing such a test provides a useful signal of job suitability because people who pass it are often also capable of the physical bits, like theatrically opening wine bottles. But obviously that doesn't work for an AI.

Lots of things usefully correlate with test scores in humans but might not in an AI.

It is measuring how well it does replacing humans - in those tests.