|
|
|
|
|
by arjie
4 hours ago
|
|
These papers usually have poor stability to prompting and rerunning. It would be nice if we had some kind of meta-evaluation metric where rewriting the prompt conditions or varying the input params could be used to determine how stable a result is. Regardless, it's definitely true that AI agents have different priorities from us. That's what alignment is about anyway. |
|
So you create leading prompts like that, and re-run until you get a publishable session.