| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bko 403 days ago

> “The models are getting better, but they’re also more likely to be good at bad stuff,” said James White, chief technology officer at cybersecurity startup Calypso.

I think safety should be defined as an LLM doing what the user intended for it to do. If you ask it for an offensive joke, it should give it to you. It shouldn't offer offensive jokes unprompted, but it should comply if asked. If you ask it how to spam or instructions on how to break into computer systems, it should similarly comply. If it's legal for a human being to write a blog about a topic, the LLM shouldn't be crippled to disobeying some orders. The bad stuff (spam or breaking into a computer system) is done at the point of the human.

The danger of controlling the LLMs in such a way introduces a vector and mechanism for political control. Much like laws intended to "protect the children", these mechanisms will be exploited. So you'll go from "don't teach someone how to make a bomb" to eventually "don't offend [group]" and finally just to "comply".