Hacker News new | ask | show | jobs
by nailer 3510 days ago
Sure, but there are libraries to identify visual similar words. And yes, you would fight them with being even more obscure, but at some point they'd almost become unreadable.

How about a bayesian approach, like email?

Contains masked content +3

New account +4

No profile pic +5

Follows accounts with low reputation +4

Greater than out imaginary threshold of 10 so maybe hide the mention from whoever's timeline?

1 comments

That's not a Bayesian approach.
There are degrees of belief attached to each aspect and the proposal works exactly like existing acknowledged-as-bayesian approaches eg spamassassin does. So how isn't it?
A Bayesian approach would learn the degrees of belief empirically from examples of spam and "ham" messages.

There are other machine learning approaches that would also learn from data - specifically, a Bayesian approach would use Bayes theorem.

Well yes, you'd base those weights on actual, human confirmed account bans.

Nothing about my post ever suggested the value was arbitrary. You've just assumed that.

The inclusion of arbitrary values suggested you would actually use those values.
The inclusion of example values was to act as an example. Nothing was stated about where they were derived from. It would be logical to derive them from actual frequency in real life moderation scenarios. Please don't make bad faith assumptions.