Hacker News new | ask | show | jobs
by gus_massa 266 days ago
Bad training weights. They gave 1 point for each correct answer and 0 for each incorrect one, so the model learned to bullshit and complete with random nonsense.

Next time, they will use 1 point for each correct answer and -.1 for each incorrect one, and 0 for "I don't know" and the model will behave. (And perhaps add some intermediate value for "I guess that [something]".)

We do that in the university. If the exam has 0 points for bad answers, I encourage my students to answer all of them.