Hacker News new | ask | show | jobs
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases (arxiv.org)
2 points by BalinKing 85 days ago