Hacker News new | ask | show | jobs
by jzombie 923 days ago
My opinion is if you want to find out what works best is to come up with a bunch of different variations in a context-free environment to not influence prior results, determine some metrics you are targeting, and start prompting away.

Then you will find the answer that works for you, and probably well more thought out than 3/4 of the articles you will find regarding this sort of thing.

1 comments

Prompt Engineering is clearly "a thing" irrespective of whether or not one trains or build models. LLMs clearly have a wide range of possible outputs given a particular prompt (even with just tuning temperature, top_p, top_k) but then, modification of a prompt can lead to significant improvements in the output. it's not a science. It's not really an art either. Certain prompts lead to better outputs than other prompts, and having a systematic way to characterize these differences is going to be important going forward.

I personally stay abreast of new models coming out and run an evals set against new models to assess their performance vs other models (say, gpt-2, gpt-3.5-turbo, etc, gpt-4.)

In terms of grounding, there is RAG, which can be built in any number of ways (PG+pg_vector, vector store, graph db). I would look at arxiv.org publicatons to stay on top of SOTA prompting stuff, as well as adjacent publications (LLMs, scaling, other things)

What kind of eval set do you use?
homegrown and full of love, like a carefully pruned garden of bonsai trees