| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by raywu 507 days ago

Other comments already mentioned multiple services (from OpenAI to Cleanspeak). I want to provide a high level clarification from experience.

Moderation is a vast topic - there are different services that focus on different areas: such as, text, images, CSAM, etc. Traditionally you treat each problem area differently.

Within each area, you, as an operator, need to define the level of sensitivity for the category of offense (policies).

Some policies seem more clear cut (eg image: porn) while others seem more difficult to define precisely (eg text: bullying or child grooming).

In my experience, text moderation is more complex and presents a lot of risks.

There are different approaches for text moderation.

Keyword based matching services like Cleanspeak, TwoHat, etc. are baseline level useful but limiting because assessing a keyword requires context. A word can be miscategorized and results in false positive or false negative with this approach, which may impact your operation at scale; or UX if a platform requires more of a real-time experience.

LLM is theoretically well suited for taking context into account for text moderation; however they are also pricier and may require furthering fine tuning or self-hosting for cost savings.

CSAM as a problem area presents the highest risks though may be more clear cut. There are dedicated image services and regulatory bodies that focus on this area (for automating reporting to local law enforcement).

Finally, EU (DSA) also requires social media companies adhere to self report on moderation actions. EU also requires companies to provide pathways for users to own and delete their data (GDPR).

Edit: FIXED typos; ADDED a note on CSAM and DSA & GDPR