|
|
|
|
|
by daemonologist
890 days ago
|
|
I suspect the inability of the model to "plan ahead" is a significant contributor to its poor performance relative to a human. Being able to check a grouping to be sure it includes at least four words _and_ to check that it doesn't conflict with the other three groupings is a major advantage - it's pretty common that these puzzles include partial or incompatible red herring groups. If this is the case, performance might be improved by taking the final solving responsibility away from the model and giving it to the script. You could ask GPT for categories, ask whether each word fits each category (discarding categories with fewer than 4 words), and then search for 4 non-overlapping categories. (This might be missing the point of the exercise though.) |
|