Y
Hacker News
new
|
ask
|
show
|
jobs
by
stared
117 days ago
Rerun it for "high" and "xhigh" effort settings, and GPT-5.2-Codex still get 0% false positive, while getting at the level of other best models for localization of backdoors:
https://quesma.com/benchmarks/binaryaudit/