I think the actual guessing space for these free response problems is much smaller, through simple priors over the question. For example:
“Richard, Jerry, and Robert are going to share 60 cherries. If Robert has 30 cherries, and has 10 more than Richard, how many more cherries does Robert have than Jerry?”
A rudimentary model will likely already know the answer is between 0-60.
Knowing that the answer involves addition and subtraction narrows it down to maybe 8 answers.
While SAT problems have only 4 answers, there’s usually one trick/trap answer, which I think might be be difficult for a model to not accidentally guess. The analogy I can think of is sometimes it’s better to cover up the answers first and work out a solution, to not get biased by any particular answer choice.
“Richard, Jerry, and Robert are going to share 60 cherries. If Robert has 30 cherries, and has 10 more than Richard, how many more cherries does Robert have than Jerry?”
A rudimentary model will likely already know the answer is between 0-60.
Knowing that the answer involves addition and subtraction narrows it down to maybe 8 answers.
While SAT problems have only 4 answers, there’s usually one trick/trap answer, which I think might be be difficult for a model to not accidentally guess. The analogy I can think of is sometimes it’s better to cover up the answers first and work out a solution, to not get biased by any particular answer choice.