Hacker News new | ask | show | jobs
by tobyjsullivan 1152 days ago
The argument is more nuanced. Importantly, the article is not making a judgement on the value of GPT, generally (at least not explicitly). It is arguing against one specific narrative.

That narrative goes like this:

Step 1: give me a specific task T from a class of tasks C.

Step 2: I’ll show you that I can formulate a prompt to solve task T.

Step 3: therefore, if you engineer prompts well enough, you can find a single prompt that will solve any task in class C.

The argument is that Step 3 is a non sequitur and that’s the scam.

The question isn’t “is GPT useful” so much as “can products be built on top of GPT?”

1 comments

More formally, the system specifies a set S of predicates that it can evaluate based on configurations (prompts etc) applied to the general method.

You have a particular predicate p that you want to evaluate, in some larger space of “appropriate” predicates P, then usually machine learning gives you some claim that,

    ∀p in P. ∃s in S. p(x) → s(x)
Call this last bit “weak prediction,” p is not easily computable or else you would not use machine learning but machine learning can compute any s in S and if we find this s then we can use s(x) as evidence that p(x) by Bayes theorem or so. Machine learning has never affirmed strong prediction, in particular there have always been ways to maliciously modify; you look at the details of the machine s, you have s(x,y) “I classify an x as a y” and s(x + Δx, y'), “I classify an x + Δx as a completely different y',” where humans literally cannot tell the difference between x and x + Δx, the alterations are in the “noise” of the data.

The “scam” is that you then tell people to hand calculate a subset X ⊆ p, so p(x) for any x in it, and then sell this as,

    ∃s in S. ∀x in X. s(x)
What's the problem? I think the claim hints that there are a few:

1. Selection bias in claimed accuracy. You generate candidates s¹, s², s³, ... and analyze their accuracy over X to pick one, say the one that gets 96% of X right. The accuracy of the selected solution sⁿ is not 96%, that's lying to yourself. The proper way to use this is to partition X into X¹ + X², use data X¹ to select sⁿ, then evaluate its actual accuracy on X². (This is how I originally read the article.) In particular there is a loss of p(x) from the right hand side of the new expression, suggesting “alignment” lapses, e.g. the machine learning algorithm that appeared to “learn” to identify tanks but actually was identifying clouds in the sky.

2. Selection bias in terms of problems solved, researchers generate problems p¹, p², p³ ... in P and we hear about pⁿ having solution sⁿ with 99% accuracy, this gives us an unrepresentative idea of what the success rates look like in P overall. (This is how I read your take, and meshes better with the final comments in the post.)

3. This broader context looks suspiciously like a problem where you can drive the false positive rate arbitrarily low by raising the false negative rate arbitrarily high, which roughly might explain the tendency of ChatGPT et al. to hallucinate.

This reminds me about issues raised by Sabine Hossenfelder with regards to grand unified theories or with theories of everything or string theory.

https://www.youtube.com/watch?v=lu4mH3Hmw2o

https://www.youtube.com/watch?v=mdu9KvLxHFg