Have you ever tried to fight spam at scale? It’s trivial to block the 50% of obvious spam accounts, but we are at the stage of humanity where the activity of very dumb humans and the activity of very clever bots, EASILY overlap. A lot.
Bots will use proxies that use residential IPs, they will maintain session info (Cookies, User-Agents, etc) they will move the mouse and introduce jitter, their access patterns aren’t random but are engineered to work in the day night hours that match the geo-data of the IP they are using. They solve captchas, at the same rate that humans do (time, error rate)
Some human accounts look like bots. They sign up and upload a couple of files which immediately get high traffic. They use NordVPN so their public IP is shared by thousands of “known bots”, their access patterns are weird and unpredictable.
Yes, you can use machine learning to try and identify theses but then you end up with false positives and real people having problems like the poster above.
And on top of all that, bots are constantly subverting detection so whatever solution you have now won’t work next week.
No. It's not just to write the code. To fake a six-month legitimate use, the spammer also needs to run the code for six months, it takes lots of resource to do so.
To avoid detection, you can't just make API calls for six months. You have to run the official client on the machine for six months, and then the official client can collect more data on your usage pattern. Imagine the cost to run hundreds / thousands those accounts.
If you come up with an algorithm to reliably and enduringly distinguish a spam account with a legit account you will be a billionaire. It's the 20 foot wall, 21 foot ladder problem.
Have you ever tried to fight spam at scale? It’s trivial to block the 50% of obvious spam accounts, but we are at the stage of humanity where the activity of very dumb humans and the activity of very clever bots, EASILY overlap. A lot.
Bots will use proxies that use residential IPs, they will maintain session info (Cookies, User-Agents, etc) they will move the mouse and introduce jitter, their access patterns aren’t random but are engineered to work in the day night hours that match the geo-data of the IP they are using. They solve captchas, at the same rate that humans do (time, error rate)
Some human accounts look like bots. They sign up and upload a couple of files which immediately get high traffic. They use NordVPN so their public IP is shared by thousands of “known bots”, their access patterns are weird and unpredictable.
Yes, you can use machine learning to try and identify theses but then you end up with false positives and real people having problems like the poster above.
And on top of all that, bots are constantly subverting detection so whatever solution you have now won’t work next week.