Hacker News new | ask | show | jobs
by ranguna 47 days ago
I get what your saying, but this is resonating with me and making me feel for the author:

Cursor: we have top notch safeguards for destructive operations, you have our guarantee, we are the best

Author: uses their tools expecting their guarantees to be true (I would expect them to have a confirmation before destructive operation outside their prompt, as a coded system guardrail)

Cursor AI: Does destructive operation without asking

Author: feels betrayed.

So yeah, I think the author is right because they trusted Cursor to have better system guardrails, they didn't (agents shouldn't be able to delete a volume without having a meta-guardrail outside the prompt). Now the author knows and so do we: even if companies say they have good guardrails, never trust them. If it's not your code, you have no guarantees.

1 comments

Sorry - still author's fault. They didn't understand how LLM's work. They thought Cursor implemented some magic "I control every action LLM takes" thing. It's impossible.
right. But cursor _said_ they had some magic. At some point you have to trust vendors. I don't know exactly how AWS guarantees eleven nines of durability on S3. But I sure hope that they do.
Here is what they say, at the very top they explain that llm's are inherently unreliable. It looks like they offer security tools and safeguards, but they also provide an auto run option. There is nothing a vendor can really be responsible for someone shooting themselves in the face. You can argue that they shouldn't provide that, but that's what people want, so they do, with warnings.

It sounds like this user either didn't use security controls, approved prompts they didn't understand, or disabled the checks entirely. Working in IT/tech a big chunk of my life so far and seeing all the dumb crap people who even know better do, I would bet my house on that being the most likely scenario rather than cursor somehow being at fault here.

https://cursor.com/docs/enterprise/llm-safety-and-controls

yeah and when you interview the junior dev who also convinces you they're smart and have something special, they also delete prod and guess what... not that devs fault.
> At some point you have to trust vendors.

You absolutely do not. When someone makes an unbelievable claim, such as having magic guardrails for LLMs that prevent dangerous actions (what would that even mean?!), you don’t have to trust that claim.

If you trust someone’s claim without justification, that’s on you.

Yeah. It would be pretty dumb for them to make that kind of claim.

Thanks for providing that doc.

> At some point you have to trust vendors. I don't know exactly how AWS guarantees eleven nines of durability on S3. But I sure hope that they do.

Trust is earned, it's built on reputations at the individual, corporate, and industry-wide levels. AWS has 20 years of reputation on which I can judge the value of their promises.

Not only has the LLM industry (it is not "AI" and never will be) absolutely not earned anything like that level of trust, the thing the technology has proven most effective at is in fact scamming. Making up something that looks/sounds convincing, especially if you aren't thinking too hard about it, is what they're best at. Combine that with a lot of money flying around and trust levels should be somewhere around "Elon Musk promises".

At this point there have been so many blatant examples of why you should never give a LLM "agent" control over production systems, but the allure of just giving some vague direction to a chatbot and telling it not to screw things up it just irresistible to some like Sideshow Bob stepping on rakes [1].

If everyone around you is whacking themselves in the face with the rake, and you know you can avoid it just by using your brain and not stepping on the rake, and avoid entirely by just keeping your rakes contained, but a rake vendor comes to you saying that instead they have built a new rake that they swear won't whack you in the face even if you leave it right in your walking path, do you trust them?

1: https://www.youtube.com/watch?v=ouau9SVVrBA

I mean, AWS doesn't really "guarantee" anything, they just say if they can't meet the bar they'll refund you in credits which is equivalent to money.
Yeah I wasn't clear with "the author is right", I think they are right to be frustrated, but that doesn't clear their own fault in the matter It's just that it wasn't their fault alone.

This is not a polarizing issue, it's not just the authors fault, or cursors fault, or society's fault. It's everyone's, and we all got something to learn from this.

Impossible?

You just have to add a human in the loop for destructive calls. Add an additional TOTP parameter to destructive calls that's generated from the agent UI that requires a human to click a button, which generates a code that's sent to the model and used in the call.

Why do you think this is impossible?

Impossible without a human in the loop.

Having said that - even categorisation of destructive and non destructive calls is inherently not safe, unless you have very strict os level / VM like setup (everything read only, world access is through MCPs so it is not LLM deciding the destructive calls but the MCP etc. )