|
|
|
|
|
by riku_iki
636 days ago
|
|
and what prompt you gave them to generate program? Did you tell explicitly that they need to fill cornered cells? If yes, it is not what benchmark is about. Benchmark is to ask LLM to figure out what is the pattern. I entered task to Claude and asked to write py code, and it failed to recognize pattern: To solve this puzzle, we need to implement a program that follows the pattern observed in the given examples. It appears that the rule is to replace 'O' with 'X' when it's adjacent (horizontally, vertically, or diagonally) to exactly two '@' symbols. Let's write a Python program to solve this: |
|
It used its COT to understand cornering -- then I got it to write a program.
But as I try again, it's not reliable.