Hacker News new | ask | show | jobs
by driverdan 523 days ago
IMO SO questions is not a good evaluation. These models were likely trained on the top 1000 most popular StackOverflow questions. You'd expect them to have similar results and perform well when compared to the original answers.