Hacker News new | ask | show | jobs
by EE84M3i 162 days ago
Would be curious to hear your hypothesis on what's the remaining 10-20% that might be out of reach? Business logic bugs?
1 comments

Honestly I'm just trying to be nice about it. I don't know that I can tell you a story about the 90% ceiling that makes any sense, especially since you can task 3 different high-caliber teams of senior software security people on an app and get 3 different (overlapping, but different) sets of vulnerabilities back. By the end of 2027, if you did a triangle test, 2:1 agents/humans or vice/versa, I don't think you'd be able to distinguish.

Just registering the prediction.

I would take the other side of that bet.

  # if >10 then was_created_by_agent = true
  $ grep -oP '\p{Emoji}' vulns.md | wc -l
I don't understand what you're trying to say here.
Just that the superficial details of how AI communicate (e.g. with lots of emojis) might give them away in any triangle test :)
I see this emoji thing being mentioned a lot recently, but I don't remember ever seeing one. Granted I rarely use AI and when I do it's on duck.ai. What models are (ab)using emojis?
Ah! Touche.