Hacker News new | ask | show | jobs
by gwern 1691 days ago
SAT problems are multiple choice, with 5 options. So 50% is barely twice random guessing (1/5).

See how far randomly guessing an integer 1-1000 gets you with OP's word problems with freeform responses.

2 comments

I think the actual guessing space for these free response problems is much smaller, through simple priors over the question. For example:

“Richard, Jerry, and Robert are going to share 60 cherries. If Robert has 30 cherries, and has 10 more than Richard, how many more cherries does Robert have than Jerry?”

A rudimentary model will likely already know the answer is between 0-60.

Knowing that the answer involves addition and subtraction narrows it down to maybe 8 answers.

While SAT problems have only 4 answers, there’s usually one trick/trap answer, which I think might be be difficult for a model to not accidentally guess. The analogy I can think of is sometimes it’s better to cover up the answers first and work out a solution, to not get biased by any particular answer choice.

"barely twice random guessing" is median score for high school students who take the SAT.