Hacker News new | ask | show | jobs
by fellowniusmonk 184 days ago
5.2 seems worse on overfitting for esoteric logic puzzles in my testing. Tests using precise language where attention has to be paid to use the correct definition among many for a given word. It charges ahead with wrong definitions in a far lower accuracy and worse way now.