Hacker News new | ask | show | jobs
by dannyz 1194 days ago
It would be interesting to see some example questions and answers. Since the test is multiple choice is it possible that the model has gotten very good at estimating how likely a possible answer is?