|
|
|
|
|
by aickin
779 days ago
|
|
Hey there, I'm the founder of a company called Libretto, which is building tools to automate prompt engineering, and I wanted to share this blog post we just put out about empirical testing of few-shot examples. We took a prompt from Big Bench and created a few dozen variants of our prompt with different few-shot examples, and we found that there was a 19 percentage point difference between the worst and best set of few-shot examples. Funnily, the worst-performing set was when we used examples that all happened to have a one word answer, and the LLM seemed to learn that replying with one word answers was more important than actually being accurate. Sigh. Moral of the story: which few shot examples you choose matters, sometimes by a lot! |
|