Hacker News new | ask | show | jobs
by ActsJuvenile 3065 days ago
Twitter is in a dire situation. As a fun project I wrote a Lua - Torch bot to search for certain tweets and hit like on them based on sentiment analysis.

I realized that API query results were mostly news bots, retweet bots, corporate PR bots, social media aggregator platforms like Buffer, and just plain old spam bots.

How bad was it? After filtering 1,000 tweets per query, I barely found 10-20 real human users. That signal to noise ratio is dismal, and detrimental to the core product experience. Twitter must be forced to maintain this fake high activity to prop up the share price.

BONUS: Guess who else is spamming their post feed: Tumblr. Tumblr didn't allow any adult content or keyword search; since Marissa Mayer took over she seems to have loosened that policy to fluff the numbers. Tumblr today is drowning in porn.

13 comments

> BONUS: Guess who else is spamming their post feed: Tumblr. Tumblr didn't allow any adult content or keyword search; since Marissa Mayer took over she seems to have loosened that policy to fluff the numbers. Tumblr today is drowning in porn.

...what?

Tumblr was known for porn long before it sold to Yahoo. If anything, Tumblr started cracking down on blogs with adult content afterwards (for example, requiring users to log in before visiting them). There was a huge backlash from artists and bloggers with non-pornographic gay-themed content, because many of them were caught in the ripple effects from these changes.

I had friends who worked at Tumblr before 2012, and the running joke was that Tumblr was 50% porn and 30% pictures of cats. (Don't take those figures too seriously, but clearly porn was a large part of Tumblr long before the acquisition).

Also, Tumblr definitely did allow adult keywords in searches. I know this because of a rather unfortunate incident at a student hackathon my company sponsored, in which a student thought it would be a "funny" idea to search for a risque phrase when demonstrating his weekend hack (an aggregator of Tumblr posts).

The porn bots have only gotten worse, though. In 2012 it was at least mostly human-run porn blogs, and they didn't start following you if you were just a regular blog.
That's true and it needs to be dealt with, but OP definitely is speaking outside of their domain of knowledge because Tumblr was always drowning in porn. OP didn't specifically mention bots which leads me to believe they are speaking of porn in general.

At boarding school many moons ago, we all used Tumblr to get our rocks off because most porn sites were blocked.

One of Tumblr's differentiators is allowing adult content. It's very hard to find places friendly to that when you're looking to distribute your artwork and get a following going.

Thankfully more and more websites are based out of other countries now and don't need to swim in the puritan current of the US.

Try hosting such artwork out of India.
I have spent several years working on a product (https://www.rapidcrowd.co) that cuts through the noise (bots, fake accounts, inactives) of Twitter to find real users that fit related topics - and your rough estimate of 20 real users in 1000 tweets isn't too far off.

For this reason, trending topics and keyword search are essentially hijacked features.

However, because I believe there are many useful bots that have organic followings in the millions - I don't believe they need to be simply removed from the ecosystem.

Instead, my suggestion would be a 'bots' account type. Some ideas:

- A robot version of the 'blue checkmark'. This would allow users to quickly identify a tweet as sent from a bot.

- This account type could be linked to a real owners account, much like Twitter apps are. Accounts flagged and failed to register as a bot could be subject to deletion.

- Bots would automatically receive low ranking in search queries, and trending topics. Perhaps they would be completely delisted.

- Bots cannot follow other users.

- Bots cannot tweet at* other users.

More extreme:

- Bots cannot tweet without some sort of spend. Maybe they can only tweet in some ratio from real Likes they receive. This is a bit extreme, but would mitigate a lot of problems.

I really believe a happy median could be found - and currently think that a well-curated Twitter timeline is amazing, but as I stated search results and trending topics are completely broken.

When I first joined Twitter a few years ago I tried the ‘search near me’ feature a few times.

Weather bots. For any city within 100 miles of where I am. Plus bots posting job listings. Plus companies posting those same job listings.

There was basically no signal to find, it was all noise. The few ‘legitimate’ ones I found were from local PD/FD.

I think they just need to ban all bots/automated postings. Or make them filters le and require a $100/mo account and $1/tweet. Something to discourage the absolute garbage.

But these bots exist because some people actually use Twitter as a newsfeed for that sort of thing. And being partly an RSS substitute is surely part of Twitter's business model these days. Not suggesting that many of these accounts weren't still absolute garbage, but the signal/noise ratio isn't always great with obviously human-run accounts either.

The other issue is that the distinction between a bot and a human isn't clear-cut: there's plenty of shades of grey in between something which spits out badly-scraped listings all day and an actual human having noncommercial conversations with Twitter friends. It's more classifying what's bad behaviour (using the pornbot tactic of mass follow/unfollow to attract attention even if you're a human marketer tweeting actual content, and even if you haven't written a script to do it) and what's perfectly acceptable automation like tweet schedulers that could use some work

>But these bots exist because some people actually use Twitter as a newsfeed for that sort of thing.

I would dispute that. I'd argue that the bots exist because spamming twitter is free. If it costs me $0 and I get even a tiny benefit out of it then it is to my advantage.

Its essentially the same problem as spam email.

We tweet the scores of our games automatically from our backend. There's no profit in it, but our members appreciate it based on follows and retweets. That's real people following and retweeting. I actually comb through and remove obviously fake accounts from following us.

I think this is a legitimate use of a bot. We even mention the host club because they want us to.

What's funny is that the account got squelched 3 times before we got a human at twitter to officially prevent us from getting flagged. So they do definitely have some measures in place to prevent spam accounts. I suspect it's become non-trivial to identify all the bad actors.

That sounds fair. Maybe it’s more of a volume issue. You get one or two ‘bot tweets’ per day without paying.

I know lots of people also use bots for cross posting from Instagram or something else. Or to post when they put a new article up on their site.

I’m sure you’d have to allow them to some degree. But there are some really noisy bots out there that need a fee attached to em.

I was signing up for your service until I saw the permissions your app wants. Permission to update my profile? See my DMs? It doesn't explain anywhere I could find why it needs these permissions.
We do not have any application functionality relating to modifying profiles.

We do request access so you can send Direct Messages from your dashboard. We are considering removing that functionality for the sake of privacy.

Unfortunately, the Twitter application permission model is not granular:

https://developer.twitter.com/en/docs/basics/authentication/...

We would prefer to just have write functionalities and not read, but this is not possible in their model.

How is what you are describing any different from sponsored tweets?
You don't have to follow a bot, so you will never see their tweets in your timeline. You can't remove a sponsored tweet from your timeline.
I haven't used an official Twitter client for a while, but you certainly used to be able to block the account making a sponsored tweet, which will remove it from your timeline.
A lot of Tumblr porn is actually very specifically (and astutely) curated. The few times I've visited, it's actually a much nicer experience than porn websites. If anything, I think Tumblr would be wise to lean into being, among other things, a space for the sharing of erotic content. Maybe there's some spam but it's pretty easy to find your way into non-spam on Tumblr in my experience.
Isn't that a bit thorny since most of the content is copyrighted...
My understanding of the Tumblr porn content is that they’re usually small clips and I believe under so many seconds doesn’t count under copyright infringement laws. However, nothing would be a problem if it was original content that had waivers and age of consent forms and such on file.
> I believe under so many seconds doesn’t count under copyright infringement laws

Careful here. This appears to be referring to the concept of fair use, but firstly there are multiple criteria used to judge fair use (https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors), secondly these criteria are (intentionally) subjective and up to a judge's interpretation, and thirdly fair use is US-only.

The term fair use is US-only but many nations have similar concepts or concepts that overlap with it. Australia's 'fair dealing' policy allows for the use of copyrighted material without seeking approval if its purpose is in satire, research, reviewing, media criticism, or news reporting, for example. What's notable there is that length or amount used aren't as important, which has some positive effects but also some important negative ones (it would not be possible to create Google in Australia because taking the summary snippets and image thumbnails has no legal justification). Interestingly in the last big debate over loosening Australian copyright law and adopting broader fair use, the American MPAA was the biggest funder of opposition efforts.
> thirdly fair use is US-only.

That seems fine. In the interest of their userbase, companies should aim for the least copyright-damaged user experience possible; this means picking a single country (ideally one with liberal copyright law) and ignoring copyright law in other countries they aren’t based out of. If countries want to force their censorship standards, they can at least be honest about it and block the website (rather than silently deflecting the responsibility of censorship to the website itself).

Not your problem when the content is user-uploaded.
tell that to piratebay
No.

They brought the heat on themselves when they decided to mock DMCA requests and cease and desists instead of accommodating them.

There's a reason 4chan, Reddit, imgur, Tumblr and its ilk all still exist in the age of copyright. None of them produce their own content-- it's all user submitted, and mostly in violation of some copyright or another.

Generally speaking, my understanding is that hosts aren't liable for user-uploaded content unless they curate or promote it (deletion notwithstanding). That changes their role to that of a content distributor/publisher instead of a mere platform. This is why backpage's CEO got arrested for human trafficking whereas craigslist's did not-- when challenged craigslist shut down its prostitution ads, but Backpage actively reworded and posted them and in doing so became their publisher.

The content uploaded to the pirate bay wasn't (isn't) copyrighted either.
Most on porn sites is, though.
Most on the internet too. Every image meme and reuploads of other images are basically also copyright violations.
This doesn't mean a company Tumblr's size can ignore it. Reddit, Google, Facebook, et al all need to consider these things as they operate.

I'm not saying they can't skirt the laws a little bit and get away with it, all I'm saying is they need to have awareness.

Nah, they just need to be vigilant about DMCA requests.

Due to the Safe Harbor rules in US copyright laws they don't really need to remove copyrighted content proactively. There's tons of subreddits exclusively dedicated to piracy that they don't care about.

Everything you say, and everything you think, is a copyright infringement. Pay up!
I've yet to figure out why anyone would consume any kind of content on Twitter.

I've used it for a while, and what I got is that it's goos for (and people use it) to spam others about your projects or show off. However, if you try to use it to get news or updates on anything it is the least efficient, most stressful thing I've ever used.

I see Twitter as a good tool for outages, natural disasters, and protests. That's pretty much it.

> However, if you try to use it to get news or updates on anything it is the least efficient, most stressful thing I've ever used.

Depends heavily on the set of people you follow. I've found it to be a great source of news, and I typically see news show up there hours to days before I see it show up in places like HN.

For instance, if you follow sports, it's an extremely efficient and direct way of keeping up-to-date with teams and players.
Twitter is the most awesome, amazing source of news and updates I've found. For example because I follow Tavis Ormandy on Twitter I know that a vulnerability related to bittorrent will soon be released by Google's Project Zero.

That said, it took months to get my feed to where it is today. It's not easy for each person to find the mix of accounts that is best for them. My recommendation is, be fast to follow folks who look interesting, and fast to unfollow folks if they are boring or you don't like them. When you find folks you like, see who they retweet, reply to, follow, etc. and follow all those folks to see what you think.

are there services out there which curate the channels available to insure quality and content? Say if I wanted to follow a particular sport or team, are there services which can set it up? Same goes for any subject
That would be very subjective. But probably exists. I am a heavy user of Twitter's List feature, and have columns in Tweetdeck for friends, local, sports, politics, colleagues etc. And try to curate these for my interest as best as possible, ie weed out too noisy tweeters etc.

I guess you can find other people's lists and follow them? E.g. a sports journalist's sports lists etc. I have not tried to do that so not sure if there are hurdles to overcome. And maybe someone can aggregate these public lists for others to find/follow?

> Say if I wanted to follow a particular sport or team, are there services which can set it up?

It's called ESPN.

ESPN only really cover one country. If you're in the US I'm sure it's fine but for everyone else the coverage is not only almost nonexistent but often factually wrong when it does exist. I saw AFL coverage on ESPN where they did not understand the distinction between goals and points and left the wrong scores up for ages.
I use it to follow a relatively small (a few dozen) group of mostly computer scientists with a few cooks/chefs and basketball writers. At this level it takes maybe 1-2 minutes to read an entire day's feed, and I usually end up with a few interesting things to read that I might not have seen otherwise. I have also found it useful at conferences to follow what's happening.

I have no idea how people follow more than say a hundred people profitably.

Yes, no idea.

What I find awful is that HN and Reddit save me time by sorting out low-quality content, while on Twitter you're the one that has to work hard to do it (if you can), and Twitter itself just adds more noise in the meantime.

I use it to follow artists, local media personalities, local journos, etc. to get live information from them.

This is particularly relevant for hyper-local media which basically has zero media footprint outside of dead-tree newspapers and talk radio.

Basically: follow your city councilor on Twitter. Start from there.

The only reason I use twitter is because it's where the forum culture of the 2000s has migrated. All the good SA posters pretty much only use twitter now, so it's the only place for that sort of content.
What's "SA"?
SomethingAwful - the accounts they meant are probably guys like @arr, @livestock, @dogboner, @sexyfacts4u, @dril
Yup, thanks.
I use it mainly to keep up with streamers and fellow artists, along with a good amount of news from various aggregators. I also have interesting conversations on there occasionally.

Twitter is what you make it. I'm sincerely worried their VC backed top heavy company is going to topple over and carry with it to oblivion a service that could probably run on 1/20th of it's infrastructure.

For sports, it's the fastest way to get news - if you follow the right people.
I was using it to get notified every time my favorite columnist put up something new. Unfortunately, the publisher started requiring registration to read stuff.

But I see it as a way to keep track of what a columnist, journalist, or public figure is doing.

Twitter is quite excellent for some communities. For Javascript and politics, for example (and also javascript politics), Twitter has become the place where the news actually happens, instead of just being disseminated.
Or it could be that Twitter cares about fake account impact mainly in terms of user experience. If a fake bot likes a tweet from another fake bot, does any human actually care?

I used to have a much bigger problem on Twitter with fake accounts and bot likes than I did, but despite my decades of willingness to bitch about spam, I have to concede that they've gotten better lately.

There's also a user benefit to letting bots run when they're not hurting anyone: you don't give hints to the spammers on how you're finding and nuking them. Indeed, if you identify them but don't block them, you can get a lot of data on what is spam. For example, on my mail server, I noticed I was getting a lot of dictionary attacks. So I took the hundred most common first names not in use on my domain and fed them all into the spam training system. That means odds are very good that my spam trainer will have seen a piece of spam before they try my actual account.

You get to see a flood of spambots if you make the mistake of clicking any of the trending hashtags, but otherwise yeah it's a non-issue for the vast majority of Twitter usage.
> If a fake bot likes a tweet from another fake bot, does any human actually care?

Very nice!

What's crazy is that bots are not violating Twitter TOS. I reported a bunch of fake accounts I found and was told they aren't breaking any rules, even ones just posting spammy links.
Well, yeah. If they're just tweeting and not @ing at other users, they can't really "spam" anyone since only their followers, who choose to see them, will see those messages.

Twitter does take a pretty dim view of DM and @reply spam though because it is genuinely annoying and not opt-in.

Is follow spam targeted? Because most of my interaction with obvious bots is follow-churn.
The trick with Twitter is that basically these bots are invisible unless you search hashtags or read the replies to a popular person who is spammed by them. It's not like anybody sane would follow these bots.

Twitter's model is basically "shouting at the universe" but if nobody listens it doesn't matter.

Problem being investors do listen to these metrics and nobody really knows how bad the noise is in that data being shared.

I 100% guarantee twitter knows the exact ratio of real to unreal users and their content.

They're also trying to slowly crack down on bots.

A year ago they insta locked your account if it looked like automated liking of content.

Now they're insta locking if you look like a follow/unfollow bot.

Its a balancing act and I feel like its analogous to getting out of extreme debt. They're trying to replace bots as their real user base grows internally.

Using Twitter for search/discovery by hashtag or keyword is totally pointless.

I follow specific people that I've discovered outside of the network, or by referral, and as a result my feed is basically "all signal".

This seems like the only reasonable way to use Twitter now..

>After filtering 1,000 tweets per query, I barely found 10-20 real human users.

I am kind of surprised the number of human users is so high. I know a lot of bloggers of various sizes. To the best of my knowledge virtually all of them have hooked up one of the available services to post on their behalf. They either spend a little time scheduling out their tweets for the next week/month then forget about twitter until their schedule runs dry OR they set something up to randomly pick from some pool of blog posts and spam links.

Either way they then essentially never go on twitter again once things are up and running. The whole thing is full of bots talking to each other.

I occasionally fetch some works from japanese artists (SFW and not) from twitter since not all of them use pixiv and as far as my browsing experience goes their retweets, conversations, mentions and so on all seem to point to other humans.

Likes, Followers and accounts yelling into some hashtags can be dominated bots. But if you look at the feeds of individual people and those that they talk with that's a totally different experience.

> Tumblr today is drowning in porn.

Tumblr has had NSFW for a long time now. If anything some of it may have migrated to patreon now that it is easier to monetize.

Isn't that kind of expected? The rate of real users pretty much matches the rate of real users on websites.

Would also be interesting to run similar analysis in FB, I'd expect similar rates here.

I would love to get copies of your script if you wanted to email a link. I'm not so interested in bots but I am interested in social network analysis.

I don't know about Tumblr wrt porn - my impression is that there's a lot of sexual material there but that it's community-driven rather than commercial, partly due to the demographics of its user base.

> Tumblr today is drowning in porn.

'Drowning' in porn? In the sense that that's a bad thing?