Hacker News new | ask | show | jobs
by rfw300 497 days ago
I have done the latter much more than the former. My experience has been the issues come from inputs that you don’t foresee, not reliability on in-distribution uses (which would be your “training” data for prompt optimization). And the worry is that this kind of optimization would lead to substantive revisions of the guidelines set out in the prompt, which could further compromise performance out of distribution.

To the extent that you need to eke out reliability on the margins, one is vastly better served by actual fine-tuning, which is available both for open-source models and most major proprietary models.