| This game is well known in the UK as the "Connecting Wall" from Only Connect. This result - poor Chat GPT performance - surprises me. I thought pattern detection and set forming was something that Chat GPT could do well. Perhaps it would need a model to be specifically trained for this task. If alpha-zero can master chess, then surely this game isn't beyond what is trainable. You can prompt Chat GPT that it'll be playing the connecting wall without having to explain the game. It still fails to make a good set of connections when provided the wall. One interesting part of the "Connecting Wall" sets is that there is almost always a "Wordy one" involving changing a letter, adding a prefix, anagrams, etc. Almost always a "Person" one for example there'll be a set of "Famous people named Tom..." but not a set of "Toms" with a set of "Margarets", and then a couple of general sets. This is a huge help given the 2 minutes and 30 seconds provided. On another note, it's possible that the GCHQ puzzle book would be in the training set, which has many puzzles with solutions for this format and a very similar rubrik with 55 items and sets of sizes 1 through 10. That said, Chat GPT perhaps would not tie the answers in the back of the book to the solutions in the front. I all, I think an AI trained for this purpose with problems and given solutions ought to end up mastering this format. But a general purpose chat GPT seems like it performs very badly. |
I would speculate it’s struggling because of the linear nature of its output, and the red-herring words which crossover between categories.
Because the model can’t “look ahead”, it starts spitting out valid combinations, but without being able to anticipate that committing to a certain combination early on will lead to a mistake later.
I expect if you asked it to correct its output in a followup message, it could do so without much difficulty.