Hacker News new | ask | show | jobs
by Kerbonut 795 days ago
You're essentially mixing units. "2 out of 20" does not match with "first try". I would have liked to see you run all of them for 20 and added comments in addition like "this got it right on the first try", which could also have been luck. I mean if it got 1 out of 20 but happened to get it right the first try, is that better or worse than 2 out of 20?
1 comments

Like I said, it's a quick test, not a benchmark. The original question was about getting at least one of out 10 right (https://www.astralcodexten.com/p/a-guide-to-asking-robots-to...). Feel free to run them yourself, takes 5 minutes.