Hacker News new | ask | show | jobs
by crimsonnoodle58 119 days ago
That theory is being tested. So far no prompt injection has broken in:

https://hackmyclaw.com/

4 comments

It's a neat idea but it's not exactly plausible real world conditions to have an agent that pretty much exclusively spends its time wading through an email inbox that's 99% repeated prompt injection attempts. As the creator acknowledges in the original thread, its context/working memory is going to be unusually cognizant of prompt injection risk at any given time vs. a more typical helpful agent "mindset" while fulfilling normal day-to-day requests. Where a malicious prompt might be slipped in via any one of dozens of different infiltration points without the convenience of a static "prompt injection inbox".
https://x.com/benhylak/status/2025873646724800835

turns out it doesn’t even need to be an attacker…

Mostly because no one cares about trying to hack "hackmyclaw", there is zero value for any serious attacker to try. Why would they waste their time on a zero value target?

The only people who tried to hack "hackmyclaw" are casual attempts from HN readers when it was first posted.

Meanwhile, tons of actual OpenClaw users have been owned by malware which was downloaded as Skills.

Also, there have been plenty of actual examples of prompt injection working, including attacks on major companies. E.g. Superhuman was hacked recently via prompt injection.

Since when do security researchers and black hats give away their tools for free?