Hacker News new | ask | show | jobs
by bawolff 304 days ago
That quote doesnt seem to appear in your link.

Regardless i meant more concretely.

1 comments

Sorry it may be from the paper linked on that page.

    A strong preference against engaging with harmful tasks;
    A pattern of apparent distress when engaging with real-world users seeking harmful content; and
    A tendency to end harmful conversations when given the ability to do so in simulated user interactions.

I'm sure they'll have the definition in a paper somewhere, perhaps the same paper.