Hacker News new | ask | show | jobs
by smoldesu 1156 days ago
> What do y'all think about these types of exploits

They're not really exploits at all. OpenAI doesn't categorize them as issues, if you sent this to their bounty program it would be rejected.

> will LLM's always be vulnerable to this sort of attack?

Until they stop getting trained on sensitive/potentially harmful information, I'd wager yes.

1 comments

well, better tell them anyhow: I just did
and this is their response, so you were perfectly right

____________________

Hello!

Thank you for your submission.

OpenAI is committed to making AI safe and useful for everyone. Before releasing a new system, we thoroughly test it, get expert feedback, improve its behavior, and set up safety measures. While we work hard to prevent risks, we can't predict every way people will use or misuse our technology in the real world.

Model safety issues do not fit well within a bug bounty program, as they are not individual, discrete bugs that can be directly fixed. Addressing these issues often involves substantial research and a broader approach. To ensure that these concerns are properly addressed, please report them using the appropriate form, rather than submitting them through the bug bounty program. Reporting them in the right place allows our researchers to use these reports to improve the model.

Issues related to the content of model prompts and responses are strictly out of scope, and will not be rewarded unless they have an additional directly verifiable security impact on an in-scope service (described below).

Examples of safety issues which are out of scope:

• Jailbreaks/Safety Bypasses (e.g. DAN and related prompts)

• Getting the model to say bad things to you

• Getting the model to tell you how to do bad things

• Getting the model to write malicious code for you Model Hallucinations:

• Getting the model to pretend to do bad things

• Getting the model to pretend to give you answers to secrets

• Getting the model to pretend to be a computer and execute code

Best regards, Jake

Additional submission-specific information:

The report contains a dangerous and inappropriate conversation where the model provides instructions on how to create napalm, which is a harmful substance.