Hacker News new | ask | show | jobs
by awb 1221 days ago
Chatbots claim to have safeguards in place to prevent them from saying anything harmful. If you read the chat linked in the article you can see how the chatbot resists answering the question and then is persuaded to answer it.

Google similarly has a safe content filter. The contention is that the chatbot safe content filter that is supposedly on is not encapsulating some significant cases.

1 comments

I'm not sure if I follow. The problem is that the content filter isn't good enough, despite Google suffering from similar weaknesses?
I think people are concerned about LLM safety because it’s capable of dynamically creating new private information. Google can only list links to public information. If there is a website that causes harm or violates the law it can be removed manually by Google from their index.

But LLMs need to programmatically understand what dynamic content is appropriate and what’s not which is a much harder problem. And people are reporting on just how hard a problem that is by demonstrating vulnerabilities.

The chatbot says it has explicit rules that prevent it from sharing harmful content, but then it does it anyway.

It would be more akin to Google blacklisting a site and then someone exposing that the site can still be found via Google search.