| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dollo_7 807 days ago

It is also about trying to get the most of that hypothesis testing, defining success and failure the best you can.

I have encountered this "mediocre success" many times in AI solutions due to lack of problem definition. For instance, now with LLMs is very easy to write a prompt that gives you the output you want in 5 or 6 examples you have in mind. The problem is to build up your testing scenario from there, and gather as much data as possible until you make it representative of your use cases.

That is the only way to actually test your prompts, RAG strategies, and so on, instead of buying the last CoT-like prompt trend.