| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dicethrowaway1 484 days ago

This is a deepity.

The trivial interpretation is: every word written can be constructed by optimizing a prediction based on current state, what has been written so far, and a sufficiently complex model. This is true of anything computable: just make the method implicitly contain the program by assigning a high probability to any token that is consistent with running the computation one more step. It's also true of anything expressible: just brute force a solution that can be expressed in n words, then assign a high probability to the first word of these n words.

The profound but wrong interpretation is that intelligence is just statistical prediction according to some general-purpose algorithm, and that this algorithm is tractable. Consider something like solving a SAT problem. You're going to have a hard time using any tractable general-purpose algorithm to predict whether x_2 is true for the satisfying solution based on some long CNF statement plus "x_1 is false".

Now, what you _can_ do is augment your model so that if the previous tokens constitute a CNF-SAT instance plus a partial answer, then you cart these off to a SAT solver and output its next token. But the more you do this, the less force the "mere statistical prediction" part holds. The "next-token predictor" is just an interface to an assembly of different approaches; and often, these approaches (like the SAT solver) will output the whole solution all at once for free.