Hacker News new | ask | show | jobs
by comex 1183 days ago
Clicking through to the link next to 'last week's text' and then to 'full rules', it looks like the author is starting the chat sessions with a full explanation that isn't included in the screenshots. (Also, the last screenshot shows the author explicitly asking about rhymes.)
1 comments

Ah thanks - for others here is the link, though TFA may not necessarily have used the same prompt: https://docs.google.com/document/d/1_eg_jiUE5y8e5zeiz5HCGDc2...

Based on that it looks like the author asked all 25 test puzzles in one big prompt, which one supposes would favor larger models. To compare "puzzle solving" you'd think it would make more sense to ask one puzzle at a time?

I tried it both ways; with individual prompts and prompts in bulk. I ran both tests the same way. There's a tradeoff in writing a legible/interesting blog post and relating step-by-step the way the evaluation was ran! Appreciate you reading and the feedback :)