Hacker News new | ask | show | jobs
by WanderPanda 980 days ago
Why don‘t these benchmarks judge the likelihood of the example answer? Just taking the MAP predictions seems like a waste of information