|
|
|
|
|
by visarga
872 days ago
|
|
Just editing a text prompt is 5% of the task. The hard part is evaluating. I would have tried a different approach: - the UI should host a list of models - a list of prompt variants - and a collection of input-output pairs The prompt can be enhanced with demonstrations. Then we can evaluate based on string matching or GPT-4 as a judge. We can find the best prompt, demos and model by trying many combinations. We can monitor regressions. The prompt should be packed with a few labeled examples for demonstrations and eval, just a text prompt won't be enough to know if you really honed it in. |
|