Beyond being able to solve Connections, can a LLM generate (good/challenging/solvable) connections? Would be pretty cool to be able to generate a test set.
QwQ gets it wrong, Gemini gets it wrong. o1 gets it right, R1 gets a pretty good not originally intended set of 4... tempted to give it partial credit. 4o gets it wrong. Will update with claude once my usage limits are up lol.
https://chatgpt.com/share/67570ab1-b2c0-8006-b5d2-d3fa7132de...
Going to try to feed this into some other LLMs like qwq and see if they can solve them.