| HN Mirror

Didn't the original authors end up leaking this before OpenAI fixed it? They gave them a chance, but then had to decide between staying fully silent or publishing the details despite malicious actors learning about it before it was fixed or leaving users in the dark. They chose it was better to warn users and inform malicious actors despite it not being fixed.

>This vulnerability was responsibly disclosed to OpenAI. Despite multiple follow-ups, we received no communication beyond an automated reply to our initial disclosure. OpenAI's documentation fails to describe sensitive capabilities granted to the model (e.g., running privileged scripts) or risks of model manipulation via indirect prompt injection, instead focusing solely on functional limitations and data-handling concerns. As such, we are publishing our findings to enable informed decision-making regarding the risk surface.

That very last sentence was considered justification of putting this knowledge into the wild when OpenAI refused to fix it. So, if we consider it justified with a delay, then we are saying it is acceptable (it is "responsible") to give the information to malicious actors as long as you tried to warn the right party first.

Compare that to two alternatives. Alternative 1 is never disclosing it to the public until fixed. Saying it is never acceptable to let malicious actors know until it is no longer a concern, even though this will mean users are kept in the dark about the risk.

Alternative 2 is to reduce that timeline to 0. Say that users are immediately warned, despite the risks of making it known to bad actors.

So if we are saying the current delay is acceptable, but both a longer and a shorter delay are unacceptable, then why is that? What justifies the current delay, what makes that the responsible one, rather than a shorter or longer window?

>I can't imagine something being "more responsible" for the public than privately notifying the owning party to give them time to fix the issue, before notifying the rest of the world (including malicious actors) about it.

What about ensure they have fixed it, and only considering it responsible to disclose it when fixed (alternative 1)? If it is never fixed, then the bug is never disclosed, because it is not acceptable to tell malicious actors how to exploit a vulnerability? Even evidence of use wouldn't be justified, as publishing this makes all malicious actors aware of it rather than just a subset of them.

And if you disagree and think some window is reasonable, then apply that argument to a slightly shorter window and repeat until either the argument hits some built in limit or reaches a window of 0.