Hacker News new | ask | show | jobs
by cl42 1115 days ago
I love these multi-step prompt 'hacks'. They very much take advantage of the fact that this is still 'just' a model predicting the next token.

Asking a model to write an email as if it were written by Hemingway requires the model to generate a probability distribution based on the context of an email it needs to write + the style it needs to write it.

In the second approach, you've changed the model weights/inputs by including the email in the context window, so the task of predicting the next token is fundamentally different (and possibly easier) for the model.

It's also why models are sometimes bad at answering a factual question, but good at judging whether their own answer is correct.

1 comments

Those hacks are snake oil
Those hacks are literally how a large language model using a transformer architecture to predict the next token in a sequence works.

They take advantage of how a function choosing a token with maximal probability of appearing works.