Hacker News new | ask | show | jobs
by vunderba 207 days ago
100%. Between tuning prompt variations depending on the model and allowing a minimum number of re-rolls, this is why it takes a while to publish results from the newest models on my GenAI comparison site.

Including a "total rolls" is a very valuable metric since it helps indicate how steerable the model is.