| HN Mirror

but you can tune your prompts so that 100% of the time they give you a valid result in the result

You can't though, that's the issue. Illustrative here are tokens like "SolidGoldMagikarp", but this does happen to "normal" sequences of tokens as well.

There is no filter you can build to keep out such mistakes, any set of otherwise normal tokens could trigger the model to produce wrong output.

Because of how large these models and most prompts are, even slight changes in things like attention can cascade into extremely different results.

there are definitely mechanisms and behaviors to discover that you can reason about.

It's faerie logic. The behaviours are mere trends and observations, not underlaying truth.

The faeries reward you for offering them fruit. But offer them apple which fell from the tree exactly 74 hours ago down to the second and they'll kill you. There is no way to know ahead of time which things will upset them.

The risk here is that you're fooled into believing these systems are understandable, that you know how they work, and that you'll mistakenly use them for something where the wrong results have consequences. You'll stop double-checking the output, all humans are lazy like that, and then you'll have disaster on your hands.