|
|
|
|
|
by mens_rea
155 days ago
|
|
Deeply flawed paper for several reasons: * Small data set of 2 runs (!!) * Exaggerated claims (saying A1 beat 50% of testers, yet only 4/10 testers found LESS vulns than A1, and A1 had a nearly 50% false positive rate) * AI agents were given 16 hours while human testers were given 10 * Their human testers gave up when a modern browser refused to open a webpage with weak TLS ciphers so....clearly not professional testers unless the bar is REALLY low these days |
|