| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kebman 1542 days ago

The spam comments usually contain hooks and symbols so that the other bots can latch onto them much easier. Querying for those signs in order to spot possible spam comment threads, with high probability, is trivial, especially considering the already existing libraries on the topic, for instance within Bayesian probability and statistics.

Sure, the most hardcore spammers would most likely change tac if thus attacked, but many would also quit entirely as it become unprofitable to spam. If they also were to train one of their AIs or neural networks, they can catch even more spam by simply looking for post and sentence patterns. For instance it's very common that a spam thread contains multiple references to a name; the name of the brand or investor, or whoever they are shilling. They're always giving some sort of advice in conjunction with that name. And at some point the posts most certainly contain weird symbols to reference the WhatsApp number or Telegram channel. So no, I don't buy that this is hard to do. I think most of it is trivial.

So why aren't they fixing it? Well, I seriously doubt it's due to incompetence. The more likely scenario is because they through earnings and statistics already know that it's not losing them any paying customers. As such it's simply a matter of priority for them. And you're not it. Because you're the product, not the customer.

1 comments

ksarw 1542 days ago

> Well, I seriously doubt it's due to incompetence.

I agree with you on that, as well as taking an ML approach. Querying the hooks and symbols directly can lead to the false positive vs spam tradeoff that TheDong is referring to elsewhere in this comment section (to be fair, so can the ML approach but its more avoidable). It is possible that the scale of it makes the minor shortcomings not so minor.

link