| HN Mirror

Prompt Engineering is clearly "a thing" irrespective of whether or not one trains or build models. LLMs clearly have a wide range of possible outputs given a particular prompt (even with just tuning temperature, top_p, top_k) but then, modification of a prompt can lead to significant improvements in the output. it's not a science. It's not really an art either. Certain prompts lead to better outputs than other prompts, and having a systematic way to characterize these differences is going to be important going forward.

I personally stay abreast of new models coming out and run an evals set against new models to assess their performance vs other models (say, gpt-2, gpt-3.5-turbo, etc, gpt-4.)

In terms of grounding, there is RAG, which can be built in any number of ways (PG+pg_vector, vector store, graph db). I would look at arxiv.org publicatons to stay on top of SOTA prompting stuff, as well as adjacent publications (LLMs, scaling, other things)