Hacker News new | ask | show | jobs
by FishInTheWater 1069 days ago
Prompt "engineering" is just writing prayers to forest faeries.

Whilst BASIC/JavaScript/etc are all magic incantations to a child, a child will soon figure out there's underlaying logic, and learn the ability to reason about what code does, and what certain changes will do.

With prompts, it's all faerie logic. There is nothing to learn, there are only magic incantations that change drastically if the model is updated.

Worse yet, the incantations cannot be composed. E.g. take the SQL statement "SELECT column FROM table WHERE column = [%s]". For any given string you insert here, the output is predictable. You can even know which characters would trigger an injection attack.

With prompts you cannot predict results. Any word, phrase, or sequence of characters may upset the faeries and cause the model to misbehave in who knows what way. No processing of user-input will stop injection attacks.

Whilst it's dubious to call current software development practices "engineering", it's utterly ridiculous to do so for prompt-writing.

6 comments

I don't get where this sentiment comes from. I build software specifically on the concept of predictable results from llm's being composable.

Sure, the results are not deterministic in that 100% of the time the exact prompt returns the exact same result, but you can tune your prompts so that 100% of the time they give you a valid result in the result category you were seeking, and with a specific probability distribution of available choices.

Prompts are functions that can take concrete input and create a probabilistic output that can be automated upon. Especially if you only need to output one token, i.e a number, boolean, word, object reference. And for obvious reasons - the further you forecast out in a sequence the less accurate you will be.

As long as you don't change the underlying model, in a massive model with billions of parameters, there are definitely mechanisms and behaviors to discover that you can reason about.

but you can tune your prompts so that 100% of the time they give you a valid result in the result

You can't though, that's the issue. Illustrative here are tokens like "SolidGoldMagikarp", but this does happen to "normal" sequences of tokens as well.

There is no filter you can build to keep out such mistakes, any set of otherwise normal tokens could trigger the model to produce wrong output.

Because of how large these models and most prompts are, even slight changes in things like attention can cascade into extremely different results.

there are definitely mechanisms and behaviors to discover that you can reason about.

It's faerie logic. The behaviours are mere trends and observations, not underlaying truth.

The faeries reward you for offering them fruit. But offer them apple which fell from the tree exactly 74 hours ago down to the second and they'll kill you. There is no way to know ahead of time which things will upset them.

The risk here is that you're fooled into believing these systems are understandable, that you know how they work, and that you'll mistakenly use them for something where the wrong results have consequences. You'll stop double-checking the output, all humans are lazy like that, and then you'll have disaster on your hands.

You can reasonably expect an LLM to respond appropriately often. Which percentage of the time depends on the details, but it’s not much more magic than expecting the bridge you built to hold up.
you could do a sort of validation of output by prompting the llm repeatedly with the same prompt and then compare the responses to eliminate outliers. I do feel like this stuff is magic though, just wanted to provide a counterpoint.
In "The Information," James Gleick discusses a concept related to our current discourse. In the days when computers were merely an array of switching circuits, luminaries such as Claude Shannon believed that "thinking" could be captured in a structured format of logical representation.

However, even with formally composable languages like JavaScript, a semblance of unpredictability — akin to the "faerie logic" metaphor — still persists. Languages evolve over time; Python, for instance, with its various imports that constantly disrupt my code, serves as a good example. This is perhaps the reason behind the emergence of containers to ensure code consistency.

While some elements may be more "composable" than others, it appears increasingly unrealistic in today's world to encapsulate thought processes or interactions with systems within a rigid logical framework. Large Language Models (LLMs) will keep evolving and improving, making continual interaction with them unavoidable. The notion that we can pass a set of code or words through them once and expect a flawless result is simply illogical.

I firmly believe that any effective system should incorporate a robust user interaction component, regardless of the specific task or problem at hand.

It's not so much about formal logic, but general predictability.

even with formally composable languages like JavaScript, a semblance of unpredictability — akin to the "faerie logic" metaphor — still persists

And they're ridiculed for it, and as you state, we design around them or replace such systems entirely.

making continual interaction with them unavoidable

Technology is never unavoidable or "inevitable". We can choose not to use it, or when to use it.

The notion that we can pass a set of code or words through them once and expect a flawless result is simply illogical.

Yet that is what we expect when we put these systems into production use, especially when many proposed use cases are user-facing and subject to injection attacks.

Whether it be the writing of adcopy, the processing of loan applications, or generating code, mistakes in these tasks have very real consequences.

I don't disagree we can choose to use it or not, but my point was more meant to indicate that, if we want a good experience with LLMs, we have to continue to interact with them to achieve good results.

Reminds me of raising kids...

You're too right.

We need to move away from prompt-engineering - it's AI-Management. You pretend you're speaking to another (albeit confusing/confused) person when extracting work from a model. You're coaxing things out of it based on hearsay and mysticism that work most of the time. Sounds a lot like AGILE and free pizza to get a junior to stay late and deliver on time.

That's not engineering, that's management.

It’s so refreshing to see someone actually write this about prompt writing. It makes an extremely refreshing change from Twitter AI influencers posting their ridiculous prose as some marvel of harnessing LLMs.
You cannot predict results in _any_ domain with 100% accuracy, especially not in most engineering domains.

Why do you think rockets explode, bridges collapse, etc.

This was magical really made my day. Thanks for this.