Hacker News new | ask | show | jobs
by fenomas 1188 days ago
Ah thanks - for others here is the link, though TFA may not necessarily have used the same prompt: https://docs.google.com/document/d/1_eg_jiUE5y8e5zeiz5HCGDc2...

Based on that it looks like the author asked all 25 test puzzles in one big prompt, which one supposes would favor larger models. To compare "puzzle solving" you'd think it would make more sense to ask one puzzle at a time?

1 comments

I tried it both ways; with individual prompts and prompts in bulk. I ran both tests the same way. There's a tradeoff in writing a legible/interesting blog post and relating step-by-step the way the evaluation was ran! Appreciate you reading and the feedback :)