|
|
|
|
|
by chasd00
118 days ago
|
|
"safe" is such a subjective concept to begin with, have any of the model providers ever defined what they mean by "safe"? It doesn't mean much to me if a safe model is one that does not output the recipe for mustard gas, that information is trivially available elsewhere. Or, is a safe model one that doesn't come off as racist? Ok but i would classify that as unoffensive instead of safe but I admit definitions of words can be fluid and change. Is a safe model one that refuses to produce code for a weapons system? Well.. does a PID controller count? I can use that to keep a gun pointed at a target or i can use that to prevent a baby rocker from falling over. Maybe they're giving up on "safe" because there's no definitive way to know if a model is safe or not. I've always held the opinion that ai safety was more about brand safety. Maybe now the model providers can afford some bad press and it not be the death of their company. |
|
Of course once you have that framing, additional goals like "don't give people psychosis", "don't give step-by-step instructions on making explosives, even if wikipedia already tells you how to do it" or "don't harm our company's reputation by being racist" are conceptually similar.
On the other hand "don't make weapon systems" or "never harm anyone" might not be viable goals. Not only because they are difficult to impossible to define, but also because there is huge financial and political pressure not to limit your AI in that way (see Anthropic)