| HN Mirror

More formally, the system specifies a set S of predicates that it can evaluate based on configurations (prompts etc) applied to the general method.

You have a particular predicate p that you want to evaluate, in some larger space of “appropriate” predicates P, then usually machine learning gives you some claim that,

    ∀p in P. ∃s in S. p(x) → s(x)

Call this last bit “weak prediction,” p is not easily computable or else you would not use machine learning but machine learning can compute any s in S and if we find this s then we can use s(x) as evidence that p(x) by Bayes theorem or so. Machine learning has never affirmed strong prediction, in particular there have always been ways to maliciously modify; you look at the details of the machine s, you have s(x,y) “I classify an x as a y” and s(x + Δx, y'), “I classify an x + Δx as a completely different y',” where humans literally cannot tell the difference between x and x + Δx, the alterations are in the “noise” of the data.

The “scam” is that you then tell people to hand calculate a subset X ⊆ p, so p(x) for any x in it, and then sell this as,

    ∃s in S. ∀x in X. s(x)

What's the problem? I think the claim hints that there are a few:

1. Selection bias in claimed accuracy. You generate candidates s¹, s², s³, ... and analyze their accuracy over X to pick one, say the one that gets 96% of X right. The accuracy of the selected solution sⁿ is not 96%, that's lying to yourself. The proper way to use this is to partition X into X¹ + X², use data X¹ to select sⁿ, then evaluate its actual accuracy on X². (This is how I originally read the article.) In particular there is a loss of p(x) from the right hand side of the new expression, suggesting “alignment” lapses, e.g. the machine learning algorithm that appeared to “learn” to identify tanks but actually was identifying clouds in the sky.

2. Selection bias in terms of problems solved, researchers generate problems p¹, p², p³ ... in P and we hear about pⁿ having solution sⁿ with 99% accuracy, this gives us an unrepresentative idea of what the success rates look like in P overall. (This is how I read your take, and meshes better with the final comments in the post.)

3. This broader context looks suspiciously like a problem where you can drive the false positive rate arbitrarily low by raising the false negative rate arbitrarily high, which roughly might explain the tendency of ChatGPT et al. to hallucinate.