Hacker News new | ask | show | jobs
by notadoc 3396 days ago
There are massive bot farms on Twitter that are obvious to even casual observers and most of them don't have "egg" accounts anymore. Look at the auto-replies to any political tweet, major news story, or from a popular account, all automated garbage that follows the same template.

Why Twitter ignores the crap which pollutes their product is amazing to me. Maybe they don't want to touch them because it ups the engagement numbers and inflates active users?

5 comments

What tools do people use to administer these? Since Twitter and other platforms seem to have no interest in fixing the problem I figure I might as well stop trying to abide by their ToS and leverage the technology for my own agenda.
What needs to happen is that they open up all the blocked account information for all users, and show when users that are blocked by X many users have deleted tweets.

The problem defined is that Twitter allows viral communication to occur in Humans, but which is being exploited in a very specific way by a small group of people who understand how to spread dissonant ideas by intentionally using polarized, meme-based arguments, which themselves are viral in nature. It's like a logic "jingle" you get stuck in your head. Or a virus.

It's on Twitter to fix the Pandora's Box they have opened with this type of infrastructure. (And I'm still sticking to my claim that Twitter is infrastructure given how much it can govern our behavior.)

OK, but any ideas on the tool front? I'm not being facetious, I've just decided that since it's a de facto free-for-all I might as well retaliate against bad actors rather than waiting for people to grow a conscience.
If you're seriously asking: for the most part the tools are in-house. There isn't a single widely used framework (to my knowledge...) that covers everything. Basically, you'd use a mix of something like redis, beautifulsoup, digitalocean, ansible, a fleet of shady proxies and various middleware for everything. Throw in postgres too.

If you're not using the API, all you're doing is programmatically signing up for accounts, storing the corresponding login credentials, writing a little library that assigns all outgoing bot activities to a user-agent and proxy, queues their activities and executes them. Naturally you have a little library that manages the random profile information creator - collect CSVs from data.gov such that you can create convincing random names and addresses, then use random profile pictures from Google images.

The secret sauce is not in the orchestration (someone could wrap all this up into a framework pretty easily), it's in the structuring of activities so that you don't get caught by e.g. having your bots follow each other incestuously. That's a rather less automated process and require active vigilance and tweaking.

What I have seen work in the past is partitioning the botnet such that blocks of them slowly establish signal to noisy credibility in specific niches before intermingling. Many botnet creators attempt to create a multiplier effect, recursively improving their aggregate signal score by having the bots in a massive echo chamber with each other. This is easily caught.

Bot block A should have a few thousand from around the country commenting on the election and sometimes posting memes from reddit. Bot block B should retweet thinkpieces from the tech industry and be the first submitter of various obscure but passable Medium articles. And so on, and so forth. Basically, have the bots act like humans who mostly talk about one category of thing on Twitter, but who still have enough nuance to not seem spammy.

Once your botnet hits a critical mass, you no longer need to do this as strictly in the future. You can spin up new bot blocks quickly and have the mature blocks retweet and interact with them to bring them up to the requisite signal score more quickly. At this point you can capitalize on trends and have a credible mass of followers influencing a conversation on Twitter within hours or days of the trend emerging. As it snowballs, you compound the other blocks to simulate rippling popularity throughout the system (i.e. trends become less about one niche and more about everyone, like Uber->Uber's Sexual Harassment Scandal->Sexual Harassment).

The other secret sauce is in successfully managing the network on a budget, because while all those proxies are what diversify your botnet's origins, individual proxies typically score negatively for websites actively looking to reduce bots.

Surely they can recognize the downsides of a bubble of inflated engagement metrics waiting to burst? Willful ignorance?
"500.000 new Members last month!!1" (I just totally made that up)

I suspect its the same as facebooks "2 Billion Users" - its just good for PR to have huge numbers. If you look too closely you might even lose Members in a Month, and we all know "growth" is very important..

You know, I could see ol' Zuck sitting there in his Herman-Miller Aeron chair looking at the internal numbers of FB 'users' and seeing the number be ~15 billion accounts that their in-house 'bot filters still think are 'real' people. He says to himself: 'You know, maybe I could tell the UN that there really are 15 billion people out there and they are all on FB. Ha, I mean, a lot of people really might believe me.' He looks out the window, sighs, and puts out an email via Thunderbird to the marketing team leads that says to keep the number at 2 billion.
Do you have a blog? Genuine question.
Nah, it's best to keep the ranting low-key and not tied to me personally. If you see my comment karma, I say enough dumb stuff already ;)
Facebook and I believe Twitter report Monthly Active Users. This isn't some smoke and mirrors number. Sure, bots may count in this number, but they aren't referencing total signups when they say "Users" like you imply.
> This isn't some smoke and mirrors number. Sure, bots may count in this number…

They do, which is why it's reasonable to consider it a smoke and mirrors number.

I'd like to see country of origin for tweets/accounts. I know this can be worked around with a vpn, but it's a start.
That would be wherever the cheapest servers can be bought. Or just random open proxies.
"Why Twitter ignores the crap which pollutes their product", this is easy to answer. Investors. Investors who don't care about nothing but vanity metrics, investors who don't make CEOs accountable. This is why, twitter is not the exception.
Sorry, but this explanation makes no sense. $TWTR has been a public company since 2013, with investors who emphatically do care about things other than vanity metrics.

A list of the largest Twitter shareholders[1] includes Blackrock, Vanguard, Fidelity, Morgan Stanley etc. These folks really don't care about "engagement metrics", except as proxies for revenue.

This argument can hold for earlier-stage companies with investors hoping to sell at inflated valuations to a greater fool later in line, but once you've been publicly held for several years, it's quite a stretch. Whatever's hurting Twitter's ability to fix its problems, it isn't "investor pressure".

[1] http://investors.morningstar.com/ownership/shareholders-majo...

Twitter was a private company at some point right? This is something that should have been fixed years ago.