Hacker News new | ask | show | jobs
by mens_rea 155 days ago
Deeply flawed paper for several reasons:

* Small data set of 2 runs (!!)

* Exaggerated claims (saying A1 beat 50% of testers, yet only 4/10 testers found LESS vulns than A1, and A1 had a nearly 50% false positive rate)

* AI agents were given 16 hours while human testers were given 10

* Their human testers gave up when a modern browser refused to open a webpage with weak TLS ciphers so....clearly not professional testers unless the bar is REALLY low these days