| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by intended 990 days ago

Dont.

If you must, there is a continuum of tasks that range from suitable to risky in production settings.

Most definitely choose things on the suitable side of that scale (Eg - text generation, or classification).

More complex tasks like Data to text or Summarization? I personally would always avoid it, except if there are certain very specific workflows for your team/task.

Further, Its not just test cases - its an entire evaluation and prompt versioning layer. Of the few that I am aware of, most are not even openly available (Including Azure Prompt flow)