|
|
|
|
|
by muzani
230 days ago
|
|
Reproduction I suppose. I would like the same things as OP too. LLM outputs are qualitative; they can't really be automatically scored and prompt enhancements tend to multiply the bug. It can solve a problem, but introduce a new one. It's practical just to do it manually. |
|