Hacker News new | ask | show | jobs
by fenomas 1185 days ago
Am I missing something? Most of TFA is about Bard failing to answer with rhyming words, but in the only prompts shown the author doesn't actually ask for rhyming words. He just says the hint and the name of the puzzle.

Is this not simply: "Bard is worse than ChatGPT at having seen the 'how-to-play' page for my side project during its training"?

1 comments

Clicking through to the link next to 'last week's text' and then to 'full rules', it looks like the author is starting the chat sessions with a full explanation that isn't included in the screenshots. (Also, the last screenshot shows the author explicitly asking about rhymes.)
Ah thanks - for others here is the link, though TFA may not necessarily have used the same prompt: https://docs.google.com/document/d/1_eg_jiUE5y8e5zeiz5HCGDc2...

Based on that it looks like the author asked all 25 test puzzles in one big prompt, which one supposes would favor larger models. To compare "puzzle solving" you'd think it would make more sense to ask one puzzle at a time?

I tried it both ways; with individual prompts and prompts in bulk. I ran both tests the same way. There's a tradeoff in writing a legible/interesting blog post and relating step-by-step the way the evaluation was ran! Appreciate you reading and the feedback :)