| HN Mirror

It's more of a research program than a product feature. No-one knows how to fully prevent a model from responding based on what's in its base training data, which is what you're seeing with jailbreaks.

And going to one of the roots of the issue - the base training data - comes with its own set of unsolved challenges, not least of which is the unavoidable subjectivity of what is or isn't "safe".