Hacker News new | ask | show | jobs
by intended 990 days ago
Dont.

If you must, there is a continuum of tasks that range from suitable to risky in production settings.

Most definitely choose things on the suitable side of that scale (Eg - text generation, or classification).

More complex tasks like Data to text or Summarization? I personally would always avoid it, except if there are certain very specific workflows for your team/task.

Further, Its not just test cases - its an entire evaluation and prompt versioning layer. Of the few that I am aware of, most are not even openly available (Including Azure Prompt flow)