filtering objectionable content actually requires you to build a strong AI model capable of being offended itself, so it knows to hold its tongue in mixed company
edit: lest i leave this comment totally useless, the chatbot engine “chatscript” has pretty good capabilities for disambiguating word meaning and classifying the meanings into “badword” and “verybadword” - its free/libre software and very high performance.
Or, better yet: we should all work to make sure every human being has their basic needs met and is treated with respect, and then words like this wouldn't have as much power.
I think actually solving this problem is somewhat loosely reducible to human level intelligence in language understanding. The next best thing is a pile of patches that fix cases of the increasing creativity of the human adversaries to the system as we become aware of them.
You should aim higher than that. After all it was human level intelligence that got a proffesor fired for saying "nèige", a Mandarin filler word, in a lecture about filler words in other languages!
a simple blocklist might be just the cheaper and easier solution. After all, blocklists were, and still are being used to filter human output as well.
And fail-safe. A person with a hobby project does not have a legal/pr department to deal with the consequences of AI having a bad day
edit: lest i leave this comment totally useless, the chatbot engine “chatscript” has pretty good capabilities for disambiguating word meaning and classifying the meanings into “badword” and “verybadword” - its free/libre software and very high performance.
https://github.com/ChatScript/ChatScript/