Hacker News new | ask | show | jobs
by ksarw 1542 days ago
I know they have team(s) of very smart people dedicated to solving this issue (at least at the individual level).

So assuming they care, I can think of two main reasons as to why it is not solved yet, both related to scale as Marques mentioned:

1.) Scale of the problem - It might be that they are already catching 99% of the stuff and we just see what falls through the cracks

2.) Scale of the solving - It could be that the teams and infrastructure are so large that they can't make the rapid adjustments needed compete in such an arms race

On a separate note, I imagine a higher quality comment section would increase engagement more than any "appealing" scam.

2 comments

> I know they have team(s) of very smart people dedicated to solving this issue (at least at the individual level).

Do you actually know, or are you being generous and still trying to assume good faith from a company that disproved it several times?

I don't see a business reason for them to take action. The spam comments don't open them to any legal liability (they already get away with much worse), YouTube has a monopoly so no amounts of spam will drive users away, the spam contributes to engagement numbers and the advertisers don't seem to mind.

I happen to know someone in this case, and am not assuming good faith from the company by any means. I trust and respect the individual.

I'm also generally interested in the comment moderation problem myself, and have been working on it myself for some time. I guess my judgement is clouded by my hope that there is a reasonable excuse for the team(s) at Google to not have solved it by now.

Perhaps it is naive of me to think this way; if it really is as simple as "this does not affect advertising revenue" then that would be quite nearsighted of Google. And, as I mentioned earlier, I am of the opinion that quality comment sections would increase engagement (and revenue as a result), so it doesn't make sense to me.

The spam comments usually contain hooks and symbols so that the other bots can latch onto them much easier. Querying for those signs in order to spot possible spam comment threads, with high probability, is trivial, especially considering the already existing libraries on the topic, for instance within Bayesian probability and statistics.

Sure, the most hardcore spammers would most likely change tac if thus attacked, but many would also quit entirely as it become unprofitable to spam. If they also were to train one of their AIs or neural networks, they can catch even more spam by simply looking for post and sentence patterns. For instance it's very common that a spam thread contains multiple references to a name; the name of the brand or investor, or whoever they are shilling. They're always giving some sort of advice in conjunction with that name. And at some point the posts most certainly contain weird symbols to reference the WhatsApp number or Telegram channel. So no, I don't buy that this is hard to do. I think most of it is trivial.

So why aren't they fixing it? Well, I seriously doubt it's due to incompetence. The more likely scenario is because they through earnings and statistics already know that it's not losing them any paying customers. As such it's simply a matter of priority for them. And you're not it. Because you're the product, not the customer.

> Well, I seriously doubt it's due to incompetence.

I agree with you on that, as well as taking an ML approach. Querying the hooks and symbols directly can lead to the false positive vs spam tradeoff that TheDong is referring to elsewhere in this comment section (to be fair, so can the ML approach but its more avoidable). It is possible that the scale of it makes the minor shortcomings not so minor.