Hacker News new | ask | show | jobs
by g-clef 1500 days ago
Oh, do I have notes on their methodology.

1) They talk about "active" accounts (meaning have tweeted in the last 9 weeks), and do a bunch of filtering against that. That seems like a huge bias - lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely. Frankly, until recently, my twitter account would have been one of the ones they would have discarded as inactive. This one thing alone makes me question all of the rest of their results.

2) By the same token, the rate or frequency with which a user sends tweets has no relation to whether a user is monetizable. If they're seeing ads, they're monetizable...lurkers are just as monetizable as high-volume posters.

27 comments

You seem to be arguing against something that the article doesn't claim. The article isn't equating inactivity and fake/spam, but that: of the accounts that actively send tweets ~20% are fake/spam.

Sure that's a different question from what proportion of all users are fake/spam, but this is still a perfectly valid question to ask, and the fact that they're only considering active users is in the title so I really don't get your complaint.

If you want an analysis that attempts to answer a different question go find or write one that addresses the question you want answered...

The article clearly states (emphasis mine):

> This represents the largest set of accounts on Twitter we could acquire, but it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit Twitter’s definition of mDAUs (monetizable Daily Active Users).

From the linked Twitter earnings report:

> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in or were otherwise authenticated and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.

EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.

> EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.

The fact that you had to do this proves the point. Nobody defines "active" the way they have here. The claim is nonsense.

That edit was made 40m before you joined the conversation. Noting your edits is a social convention and voluntary concession offered by a posts' author to validate replies that were made before the edit, while clarifying the authors intended message for future readers. If those future readers use the content of the edit message to shallowly refute the post, consider the incentive this creates to not follow that convention for all authors in the future. If you have a valid refutation, surely you can find evidence for such in the body of the message rather than nitpicking the edit history.
I think you misunderstood their response. They are saying that the study has an unusual definition of "active", and that your need to clarify the definition proves that it is unusual.

Though personally I think filtering specifically for users that actively send tweets makes sense, since that's really what matters when it comes to measuring how healthy and authentic the discourse is

What is the proper definition of "active"?

It seems like everyone is arguing about different metrics and it makes more sense to discuss different, specific measures that might fall into a range of behaviors that are "active" in some sense rather than focusing on which definition of "active" is somehow the best one.

What would be more interesting would be to adapt this and answer several different questions about the proportion of spam among accounts with different metrics of activity to see how things change. For example, does the percentage of spam accounts go down a lot if we lower the bar for "active"? How much & how fast?

> What is the proper definition of "active"?

Twitter's quarterly earnings define active users thusly:

> Twitter defines monetizable daily active usage or users (mDAU) as people, organizations, or other accounts who logged in or were otherwise authenticated and accessed Twitter on any given day through twitter.com, Twitter applications that are able to show ads, or paid Twitter products, including subscriptions.

https://s22.q4cdn.com/826641620/files/doc_financials/2022/q1...

I'm pretty sure I've heard a similar definition from Facebook.

This definition supports g-clef's critique that the article picks an unorthodox way to measure active users, resulting in an inflated percentage of accounts being measured as spam/fake accounts, vs what the percentage would be if measured against Twitter's definition of 'active', which includes lurkers.

Strange rant. It's not about you editing your post in general. It's that your edit shows that saying "active accounts" when you really mean "accounts that have recently tweeted" is wrong, like the very title of this submission.
The point is that their definition of active is inaccurate. You can be an active user and not tweet.
Look, there are dozens of potentially interesting and valuable questions to ask on this subject. Answers to which may produce a wide range of insights and conclusions. And there's a whole potential conversation about which questions are most important, that may have different answers depending on the context.

But there's no reason to pin the whole frame of the conversation to the one question for which Twitter corporate chose to publish an answer, unless the only question we are interested in is "did Twitter technically lie" which is the most uninteresting question in this whole situation. If this is the sole context you are using to frame this issue then maybe you should consider if you're following the current news cycle a little too closely.

The idea that there is such a thing as an 'inaccurate definition of active' is silly.

In the light of Musk's statements, which presumably precipitated this timely article, I would say the question of whether Twitter technically lied is the most important question for Musk doing the things he does.

If you're more interested in Twitter's ecosystem as a whole, it is less interesting.

At every company I've worked at any time someone has asked "How many active users do we have?" it was a difficult question to answer because everyone's idea of "active user" was different.

"Active, as in logs in regularly? Wait, what is 'regularly'? Once a week? Once a month? Every day? Does 'active user' mean, online right now?"

Etc, etc...

Their definition of "active user" is relative, not inaccurate.

>"did Twitter technically lie" which is the most uninteresting question in this whole situation.

I don't know, that seems more interesting than most questions that could be asked about Twitter.

> I don't know, that seems more interesting than most questions that could be asked about Twitter.

Why? Twitter is a for profit corporation. If, on the balance, lying serves their interests (I'm sorry, I meant "is consistent with their fiduciary duty to their shareholders") more than edging up to the line without crossing it, that's what they will do.

Even the watchdog organizations such as the FTC and SEC that police the speech of corporations more or less limit themselves to material statements that move markets or influence consumer behavior in ways that can be considered fraudulent. The FTC, FDA, and others are concerned with a fairly narrow reading of consumer harm, the SEC is motivated by the health and trustworthiness of the public market. In any case, there pretty much always has to be some sort of alleged harm. Lying per-se is hardly ever forbidden. So if the advantages of a lie outweigh the (risk adjusted) penalties and reputational risks, that's that.

I think a conversation about what ways we expect and permit corporations to lie, either specifically in financial statements or to the general public, is much more interesting than a discussion of exactly how many fake tweets there are and exactly how many accounts are making them, though I guess you could construe that as broadly part of the same conversation.
Sure. But if I'm looking to purchase Twitter, I think I'd be much more interested in and concerned about this "white" lie than you are as a general consumer.
I think it's pretty easy to argue that their definition is intentionally misleading, which may not be technically inaccurate, but is arguably just as bad.

The big story in the news last week was "Elon Musk says deal on hold while verifying twitter's 5% Monthly Active Users stat", or something to that effect.

That's the context this article was published in. It is transparently obvious they are re-using the word "active twitter accounts" to cause confusion with the definition of "active" that has been being bandied around. The post is using such a title as a clickbait, to hop aboard a trend.

I think the title, and lack of significant clarification in the article, make it clearly misleading, and I don't think pedantic "well technically active can have multiple definitions" changes the reality of the situation meaningfully.

But their definition makes things look worse for them. The high number of lurkers would make the percentage of fake accounts smaller.
I don't understand what you mean.

Let's take both their numbers at face-value and assume they're true.

Twitter has reported: 396.5 million logged-in-this-month users, of which 5% are fake/spam (19.8 million fake users)

This article reported: Looked at 44,058 tweeted-recently accounts, of which 20% are fake (8,800 fake)

Which of those stats looks worse for them?

> The high number of lurkers would make the percentage of fake accounts smaller

Why? Twitter included lurkers in its dataset, this article didn't, why should that impact stats in the direction of fake accounts being smaller?

I thought the parent was criticizing Twitter's active monthly user definition, which only includes people who have tweeted in the past 90 days. The article used this definition of active use as well.
Twitter requires users to log in before lurking so their definition of activity is intentionally selective. I'd be surprised if Twitter doesn't know how active their users actually are, even the lurkers.
I read lots of tweets and don't have a Twitter account, or at least one that I've logged into in the last 10 years... The philosophical question seems to be, "am I a Twitter user"?

You could probably argue that most of the world read Twitter and hence are users, account or none. It's that pervasive.

But then there's the next question: "am I a user that reportedly matters to Twitter's business?". What people are trying to land on, in light of Elon's tweet that the deal is on hold pending investigation of Twitter's metrics reporting, seems to be a framework for carving out what exactly constitutes a user that brings the platform revenue that shows up in quarterly reports and hence would directly relate to the tangible value of the enterprise.

In reality, nobody knows what numbers are being thrown around behind closed doors. This article is just one framing.

For most of Twitter’s existence, this was me. I used Twitter a lot but I never tweeted.
They define it in the article as accounts that have tweeted within the previous 9 weeks. If you are lurking and not tweeting you are not "active".
> of the accounts that are active ~20% are fake/spam.

Nope. ~20% of accounts that tweets are fake. A lurker (aka read-only) is by all meanings an active account.

It's not an active account if by "active" they mean "generating content". While Twitter isn't a typical content aggreagation site like Youtube or Reddit, tweets are still "content" in the sense that they drive further user engagement on the site.
Words used to mean things. The current HN submission title just says active, heavily implying accounts with any kind of activity (eg. like, follow/unfollow), not "users who Tweet".

Sure, clickbait headlines are the norm and the devil lurks in the details, but still, many comments have been spent on this, because it's clearly misleading.

~80% of email is spam, it doesn't surprise anyone, because it's so cheap to send spam. Similarly it's easy to create fake accounts and spam, yet it doesn't mean much.

Who's counted as "engaged"? The people reading, or only the people writing? More to the point, if Twitter moved to a subscription model, would zero lurkers buy in?
Seems like social network aren't interested in counting those that don't use all potential features of the platform. I'd say a lurker/ghost member is definitely an active account.
I would say that if someone is able to be advertised to (since that is what makes the business money) then they should be counted. So yes, there should be no requirement to tweet to be counted.
Absolutely, anyway they're earning money from these users, so definitely count it.
Not according to the article, which is the point...
"Active user" is a common industry term with a well-defined meaning. It's misleading to use it to mean something else, particularly when there are a number of more appropriate choices, e.g. "20% of Twitter posters".
It isn't misleading when the article itself explains how they are using the term.
The article clearly defines those accounts as "active" because it's the only way an external observer can somehow isolate an "active" group. Only twitter can know how many users are "lurkers".

And since they are trying most probably to get some PR for their company, they use their specific definition of "active Twitter account".

You are an inactive user. According to me, being an inactive user is making a comment I disagree with.
When you are in the context of : - Twitter determine the active status of an account using login - People are wondering the % of active users as defined per the twitter metrics

But then use your own definition of active and write only a one liner on the difference with no reflection on the impact it might have and no warning on the fact you are answering a different question. Then my conclusion is you want people to make this mistake.

> EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.

Made me laugh because you had to add it and made more effort than the author of the article to prevent the confusion :D.

Interesting. This could be a bracketing error, because I read

> it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit

> Twitter’s definition of mDAUs (monetizable Daily Active Users)

As implying that they think accounts that haven't tweeted in the past 90 days don't fit Twitter's mDAU definition. Given the placement of the qualifying phrase, I think that's a reasonable parsing of the sentence, but I see your point that they could be trying to imply their set doesn't fit the definition. If so, that sentence is very badly constructed.

The full quote doesn't do SparkToro and Followerwonk any credit:

> Followerwonk selected a random sample from only those accounts that had public tweets published to their profile in the last 90 days, a clear indication of “activity.” Further, Followerwonk regularly updates its profile database (every 30 days) to remove any protected or deleted accounts. We believe this sample is both large enough in size to be statistically significant, and curated to most closely resemble what Twitter might consider a monetizable Daily Active User (mDAU).

The fact that they don't even consider the concept of a non-tweeting lurker to be an mDAU brings their entire analysis into question. Let's face it - Twitter is an emotionally-charged enough place, and tweets have such a way of living forever and being taken out of context, that there are many who use it to consume (and perhaps Like) content but will not tweet publicly. These people are still viewing and engaging with advertisements! Twitter absolutely should consider them monetizable!

But of course, engagement data on lurkers is internal only, and Likes data counts against global API caps: https://developer.twitter.com/en/docs/twitter-api/tweets/lik.... Which means that SparkToro and Followerwonk are incentivized to ignore these users. That they do ignore them, and don't address it anywhere in their methodology, is highly suspect.

The article is just clickbait. The title is obviously clickbait (based on your edit you've realized that "active account" !== "accounts that tweet"). Then they try to define active account:

> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”

Ok, but "consuming the activity on their timeline" is essentially unknowable outside of Twitter, since you can't see what tweets people are viewing. It turns out they're trying to infer this through some other signals like follower count, etc. But you can imagine why that might be sketchy.

Then they constrain the analysis: > A more fair assessment of Mr. Musk’s Twitter following would only include accounts that have tweeted in the past 90 days

Let's be real, if you look at a list of Elon tweet replies, they might as well all be spam. Just search @elonmusk and sort by latest. Then compare that to the sorted tweet replies under an actual tweet. IDK how many millions of dollars and man-hours went into the AI that sorted this list, but it seems to just be putting the blue checks at the top and shrugging at the rest. I doubt this three man team is doing any better at spam detection.

For manipulation / spam purposes I don't really care about accounts that don't actively post/like/retweet/follow. The mDAU isn't useful at all for determining if the activity on Twitter is done largely by bots.
I do wonder how "fake" is calculated. Is @tweetsfrommydog fake? It's a real person making tweets that are funny and provide value to the platform, but it's not a real person as an individual tweeting their personal thoughts, are corporate accounts or parody accounts fake?
It is valid criticism because the context of this article is that Elon Musk wants to know whether Twitter's own claims of ~5% fake/spam accounts is accurate. We do really want an analysis that investigates that precise question and not a related one.
Elon Musk waived his right to due dilligence ... more fool him.

You can file this in the 'pedo guy' cabinet of his life story where his child-ego got the better of his undoubted business skills.

He can do due diligence but from what I heard (correct me if I am wrong) he has to pay a heavy penalty ($1B) if he backs out.
According to Matt Levine, that's "not how any of this works". The $1B is if he could not secure financing, but it appears we are now past that point. The relevant question is whether the Twitter board wants to sue in court to compel a sale.

Given what Musk does to the personal lives of his opponents, I'm not sure I would want to fight him. But given how many laws and rules he's broken at the point, I think there is a clear failure of justice if he can just do whatever he feels like without repercussions due to his common popularity.

What does he do to the personal lives of his opponents? And why would the board not do their fiduciary duty out of fear of that?
Lurkers are also the most important people. They consume the content. They are the meat of the business, the ones that respond to advertising and political messaging. If I were twitter I would champion all the lurker accounts, all the eyeballs to which twitter serves content. Nobody ever faulted the Nielson ratings scheme for "lurker" viewers who only watched but didn't themselves create television shows.
Definitely agree. I joined Twitter four months ago. I haven't tweeted yet, but I'm reading it daily on the app and occasionally liking tweets.

I've been so surprised at how effective the advertising has been on me. I've never experienced this level of engagement with online marketing. Ads for TV shows, movies, live shows, musicians and comedians have been particularly effective.

I've found myself following a lot of show writers I've never heard of, and I even signed up for some new streaming services because of it. Google and Facebook ads never felt like they impacted me, though I know how important and dominant they are to business marketers. I've never clicked on a banner ad and my eyes glaze over sponsored links. Twitter's level of engagement with their marketing content is new to me, and I'm impressed.

I actively work to block or prevent ad tracking. When youtube serves me an ad for retirement planning or feminine hygiene products, that is my little victory. That is me successfully preventing them from knowing enough about me to target ads.
Furthermore, there are the non-tweeting active users (ones who like only) and the ones who RT a lot but don't create organic tweets.

Those are indeed incredibly valuable. Engaged audience = your real audience.

Unlike passive media consumption though, Twitter needs users to submit content (tweets, replies) to give lurkers something to do.
Yes and no, just like any major media platform, huge majority of tweets being seen are from a very small group of influencers/popular person. That's why when you join twitter, it suggests to you a lot of people to follow that are already big.
There's only a yes in your answer.
No. You can have only 10% of accounts actively tweeting and the rest just consuming what those post. All those - active and not - are monetizable
You don't really need that many people to submit content though. I imagine most YouTube users have never uploaded a single video, and they don't need to, since there's basically no end to available content there.
Twitter specifically added the annoying feature of your likes being shown to your followers so that lurkers would be actively contributing to the algorithm though.

As long as lurkers are "liking" content, their local network will see an engagement increase.

This is a second-order objective though. The goal is to show ads to humans on the platform. Having a lot of human authors (or any kind of content authors) generating content is a way to achieve the goal, not a goal in itself.

There are other ways to achieve the goal, such as making ads more relevant (targeted advertising), having users consume more of the same content (recommendation), having the same content take longer to consume (periscope). Growing the number of human posters is definitely not a requirement.

The people who create content do it in such massive amounts that this never seems to be an issue.
And I thought it was common knowledge that lurkers always vastly outnumber people who post content on any platform. If lurkers outnumber posters by at least 3:1, then 20% goes to 5% and twitter’s “<5%” figure is accurate.
Lurkers are probably anywhere between 8-12:1. People actually posting stuff on the internet are in the vast, vast minority, creates a sort of echo chamber.

I am technically "logged into" twitter so I can click through and read the postage stamp-sized charts linked to through various articles and blogs, or watch a video about a riot in some far flung part of the planet. Once a year I tweet at airlines when they lose my luggage or whatever but otherwise don't tweet. Twitter isn't a good social media service, it just happens to be the image/video sharing platform of choice for journalists to promote themselves.

> That seems like a huge bias - lurkers exist

I created an account 5 years ago, followed one or two people, got bored and never logged in again.

Presumably their intention is to exclude abandoned accounts, like mine - is there any way they, viewing Twitter externally, could tell lurker accounts like yours and abandoned accounts like mine apart?

As a third party? Probably not. Which is why it's going to be very hard to disprove Twitter's assertion unless Twitter chooses to share their data.

That's part of why I find articles like this frustrating: I don't think they have the data to actually answer they question they're attempting to answer. Knowing that, what's the purpose of the article?

> Which is why it's going to be very hard to disprove Twitter's assertion unless Twitter chooses to share their data.

It's impossible to disprove Twitter's assertion because they never claimed that less than 5% of their accounts are spam. From their quarterly earnings:

>We define monetizable daily active usage or users (mDAU) as Twitter users who logged in or were otherwise authenticated and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.

>... mDAU does not include users accessing Twitter through third-party applications.

Their statement said that less than 5% of their monetizeable daily active users are spam. There very well could be 50% of the entire user base as bots or spam, but that doesn't negate the metric Twitter releases.

This doesn’t resolve the issue the article has though. I’m a mDAU because I’ve logged in, yet there is no way for the people writing the article to know that I’m active.
Yeah the article has a few big issues, yours is definitely at the top.
They could maybe use like activity in addition to just tweets? Inherently though this system is going to be less accurate than the dataset that Twitter has access to. If a large chunk of users only engage in Twitter through DMs then an external organization isn’t going to have insight into that.
I would imagine Twitter would have access to analytics that third parties don't have, which would allow them to pretty easily work out which accounts are logged in and used for browsing and which are actually abandoned.
As a small complication: I have a twitter account, doubt I've ever tweeted. I browse twitter quite often, but I'm _never_ logged in.

No idea if I should be counted or not in any particular bucket, or how anyone would know.

AFAIK, you're not counted in any bucket. That's one reason TWTR wants you to log in to read. So active user numbers go up.
I thought they used a banner that pretty much forced you to log in to see more than the first few tweets in a thread now (same as instagram)?

I have an account that is logged in, but it has only sent 7 tweets since 2014 (and they're only to customer service accounts).

It doesn't seem to. A few months back it was ~forcing for a bit so I moved to nitter.
What client are you using that allows you to browse without logging in?
Opening a Twitter link in a private tab is the low complexity solution, or there's nitter.net, or deleting cookies, or various browser extensions that delete cookies for you.
I had no idea it was so strict. I just use Firefox. Their cookie behavior must be picky enough that it bypasses whatever nonsense Twitter is doing?
Firefox :shrug: . It's never forced me. When it gets too annoying trying to push me to login I move to nitter.
Private browsing mode ("incognito" in Chrome)
is there any way they, viewing Twitter externally, could tell lurker accounts like yours and abandoned accounts like mine apart?

No. Which is why the only reasonable thing to say as an external party is "we don't know."

If an account is in lurk mode, then its not a spammer so I'm okay with it being left out of that equation.

Where I might agree with you is a lurk mode account could become collateral damage in being considered fake. Lurkers don't retweet though. An account with a million followers isn't seen by everyone. Having a portion of that million like/retweet amplifies even further with their network now possibly seeing something from someone they are not following directly.

I'd be willing to accept that the number of lurkers that get lumped in with fake accounts when deciding the percentage of actual eyeballs on posts is not harmful. Those numbers are made up stats anyways. Like the old days of TV/Radio stations that covered large cities with millions of citizens. They would claim they have an audience in the millions even though a small fraction were actually watching/listening.

Except the question isn't about the pure number of spam/bot accounts, it's about the ratio of spam/bots to "authentic" users. If you leave out the lurkers, that ratio gets skewed to mistakenly inflate the bot count.
First off, I don't give 2 shits about twitter, so I don't care if the numbers are skewd in either direction. This is more of an interest in seeing how SV stats/metrics are just a game. Just so that's out there.

A lurker isn't an active user in my opinion. Maybe that's not the same understanding as accepted definition. The lurkers might be absorbing some of the ad content, but they are not helping create new avenues for ads to be shared. Twitter's ad share surface area would increase tremendously if every user was actively producing tweets. That's the only metric that they are concerned. They don't care about how many people actually see the ads once they are there. They make their money on the potenial eyeballs alone. Lurkers are not helping increase those numbers.

> They make their money on the potenial eyeballs alone. Lurkers are not helping increase those numbers.

I don’t follow this.. Lurkers are they eyeballs presumably.

If everyone on twitter tweeted the same amount it would probably just drown out the popular accounts and create a more diffuse and less profitable ad space I think.

>> They make their money on the potenial eyeballs alone. Lurkers are not helping increase those numbers. >I don’t follow this.. Lurkers are they eyeballs presumably.

The number of eyeballs allows for the price per ad to increase while the number of places ads can be placed increase the volume of ads. If lurkers are not helping to increase the volume, it doesn't make the platform as much money. Proving the lurkers are actually consuming the ads and making the ad buyer happy is non-trivial. Proving the lurkers are worth increasing the price per ad is also non-trivial. In the end, I personally feel like it is a wash by lurkers being overly represented in the fake account numbers.

Compare Twitter ads to the ads in a newspaper or something. 100% of a newspaper's readers are lurkers, but ads still seem to be worth more than $0.
Volume of ads is irrelevant. An additional tweet to attach an ad to does not generate revenue if there is nobody looking at it. On the other hand, though, an additional set of eyeballs on an existing monetized tweet does generate additional revenue.

As an extreme example, a single monetized tweet with a billion viewers generates money. A billion monetized tweet with one viewer obviously does not..

Why are lurkers not helping numbers? It's the exact same as Youtube, do you expect majority of lurkers on YouTube to not be counted because they didn't create a video? People follow what is already out there and ads target the people watching.
Lurkers are the eyeballs…
And yet I find 20% more believable then under 5%

Edit: I guess it's true that lurkers won't be bots, unless they are clicking on ads or trying to simulate engagement to help certain twitter accounts seem popular.

All those fake followers you can buy could just aswell be "inactive" lurkers though.
That means that 20% of the posts that I see, as a lurker, are generated by bots. The bots are having a huge influence on conversations, and that's important to know.
> That means that 20% of the posts that I see, as a lurker, are generated by bots

I don't see how you can arrive at this conclusion. It depends on who you are following, with some additions by the algorithm (unless you use the chronological feed) and (speculating here) the algo pushes content from real humans.

I read tweet replies, not just tweets (apologies if I'm not using the correct terms, I'm not an active Twitter user). The original tweet may be a real user, but I often dive deep into all of the comments. If 20% of those comments are from bots, then that's a lot.
No, since you choose who you follow, you're most likely filtering for interesting stuff. I'd wager that most of the spam bots are pretty obvious to spot, and makes up very little of a user's feed.
I rarely read my feed. Most of my lurking is on the replies to famous/infamous tweets.
I don't know how many original tweets are made by bots but 20% of the replies to anyone with a 5 figure follower count seems to fall on the low side of what I would guess.
Doesn't have an url in profile is sort of a weird metric. Note everyone is there to self-promote
I have a twitter account, but I have never tweeted or retweeted anything.
Same with my account. I only login from time to time when I am forced to sign in to view something.
It sounds like you are genuinely a non-active user, and probably not interesting from the PoV of Twitter/acquirers or the GP poster. This thread is about lurkers: people who regularly log in and read their feed (thus consuming ads and being relevant from Twitter's business perspective), but who don't post and would thus be excluded using the methodology of TFA.
Why do you say GP is non-active rather than lurking? They do read tweets, see ads, and even have an account that they log into.
I was offered $300 for my twitter account, I suppose partially on the basis that I haven’t tweeted much, but I use it daily to weekly though don’t tweet often, one tweet in last 2 years or so.
Well, I've been actively trying to create a new Twitter account for a little under a month and Twitter thinks I'm a bot. I've made 1 tweet and followed 5 people.

Even paid for Twitter Blue...still thinks I'm not real. Support is unreachable.

My current plan is to wait til Elon completes the takeover and then build an entire site dedicated to getting Elon's attention to unlock my account...because that's the only way to contact somebody apparently.

Have you tried tweeting at them :P

Edit add: I find it horrible that we have companies that you can not contact, in fact they seem to be going out of their way to make hard to contact them.

Even things you pay money for, like airline tickets. They want you to email them, make the phone number hard to find. So you do, they don't respond and then you have to search and call them, wait an hour or more on hold. The agents are nice but the entire process is terrible.

Earlier I had to do that for a damaged luggage claim. Went through the automated phone assistant to get to damaged luggage claims and it gave the option to use text messages. So I give it a try, nope. They can't resolve the issue through text, has to be on the phone. So I had to call back, re-enter all the info through the automated system and then ignore it's pleadings to use the text system.

Probably forced to since they do not have access to login information. Especially since if you do not post but login you are certainly not a spammer ^^, could still be bot crawling.

But they probably should expand more on this and reflect on how much inaccuracy it adds. With a quick search you can find that less 50% of US users tweet five times a month (https://www.pewresearch.org/fact-tank/2022/03/16/5-facts-abo...). Or the study which, reported that the top 25% of user produce 97% of the content, the median user of the bottom 75% as posting 0 tweet a month (https://www.pewresearch.org/internet/2021/11/15/2-comparing-...). Those studies were done using survey I believe so should include only active users and no spam/bot.

So with random invalid maths, if you make the assumption that the 25% less active users might not even post every two month (exponential decrease of activity ?) then you need to add back a quarter of the 80% they found as active.

Not to say I believe the 5% number from twitter; and I was going to use the price for a thousands follower as an example, but seeing it appears to be at 30$ now (https://socialboss.org/buy-twitter-followers/ ?) when I remembered it at like 5$ then the twitter team might have done some good work ;).

But one can say that 20% of the content on the platform was distributed by bots. Meaning that all the Lurkers have to consider if they are really interested in content, that was pushed by some bot-farms. Technically, every user of this platform has to take a step back and evaluate, if anything they have seen is not pushed content by some bots.

20% is huge and I am curios if there will ever be some comparable "official" numbers to that.

No - you can say that 20% of the accounts actively posting are spam/bots.

It's possible they are posting MUCH more or less than 20% of the content.

If these are skewed toward the high end of producers - the 80/20 rule would say that as much as 80% of the content could come from them. Still - it's possible this content isn't interacted with much outside of other bots. You can't draw many conclusions from such a limited data point.

100% this. I haven’t tweeted in nearly 3 years, and even that was a retweet. But I’m still logged in and consuming crap from Twitter all the time
Same, last tweet from me was in December and I check Twitter daily. My last self-composed tweet is well over 2 years ago.
If it's the 80/20 rule then there's 4x of the other 80.58% that are lurking - which brings down % of fake/spam accounts.
There was this suggestion to conduct a sting operation of displaying captcha to a sample of users to determine the % of the bots.

Probably picking the sample is still challenging but at least can somewhat tell if the accounts in the sample are genuine.

The method in this article is so flawed that Larry Ellison, founder of a famous law firm, would count as an inactive account since haven't tweeted since 2012[0] and that person apparently looks into investing in Twitter[1]. How can be investing a billion in Twitter when he doesn't use Twitter at all?

[0]https://twitter.com/larryellison?lang=en

[1]https://www.grid.news/story/politics/2022/05/16/larry-elliso...

They point out that's their definitions of active accounts is a flaw in their methodology (inside the article). However, I think it's fair to say that while TWTR has better internal insight into an "active user", it's the best approximation one can do from the outside.

I do wonder about, given perfect knowledge, how the bot accounts would shake up. What percentage produce content (presumably propaganda, automatic tweets using it as an RSS like announcement service, and spam) vs follow people (boost follow accounts, sell likes)?

>They talk about "active" accounts (meaning have tweeted in the last 9 weeks), and do a bunch of filtering against that. That seems like a huge bias - lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely.

All true. However, do you really believe that a bot is more likely to be active than a real user? If so, fair play to you. If not, then we would expect inactive users to be bots in an even greater proportion than what we see among active users.

We can argue about what the article did and didn't imply, but what's interesting to me about the issue you raise is that among lurkers there is probably a much lower rate of fake/spam activity, since there are fewer reasons for a bot to log in and not tweet. Couple that with the fact that lurkers are generally the vast majority of users on any platform, and that alone could explain the discrepancy between Twitter's 5% number and SparkToro's 20%.
Services that sell followers and spammers "aging" accounts generally would look like lurkers. Twitter could probably get an accurate estimate with the amount of analytics they have for internal use only, but of course they might be incentivized to not try very hard.
Perhaps they are attempting to argue that the value comes from the users that generate content more so than the eyes attached to the account?
> lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely.

I've spent many, many hours lurking on twitter, don't have an account at all, and mostly access it through nitter instances. Are they "biased" for not including me?

edit: should inactive users be counted as active users?

Yeah, and I fully expect that these numbers went up recently with Twitter requiring login to view threads.

The fact that they add a .42% is a red flag in itself, especially when they admit in their own post that they agree that their analysis is deserving of critique. Very misleading stuff.

Their analysis using purchased bots seems a bit more reasonable.

“Passive” accounts may actually be more likely to be bots as many services sell fake followers. It’s just harder to detect with public information rather than their IP addressees etc.

Similarly I don’t think there is any way to separate active vs abandoned passive accounts as a 3rd party.

> They talk about "active" accounts (meaning have tweeted in the last 9 weeks),

This is not their definition, that's what Twitter considers an active account in their revenue reports.

> has no relation

It has some relation, no? I wouldn't be surprised if there is a strong correlation between how frequently a user sends tweets and how monetizable that user is.

their TL;DAbstract refers to this as a 'conservative' methodology, that is 'rigorous', and 'likely undercounts.

Their definition:

> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”

They note the following to differentiate fake and spam: > Many “fake” accounts under this definition are neither nefarious nor problematic. ... By contrast, most “spam” accounts are an unwanted nuisance.

Some general data analytics notes from their post:

* Then lump together fake and spam in their analysis - and this really matters! somewhere like NYT is both 'fake' meaning it isn't a real person and A HIGHLY VALUABLE ACCOUNT for twitter to have.

* They use a sample of 44,058 accounts (of ~1.047B)

* They look at a number of classifying variables (17), spam accounts met 10+ of those 17 criteria. They don't list all 17.

* The criteria were developed from a "machine learning process" that is undescribed, and was developed from a sample of 35,000 'known' fake twitter followers bought from 3 vendors and 50,000 claimed non-spam accounts. They appear (imply?) to have used 50% training 50% real data but dont't specify explicitly.

* They say their model is about 65% accurate, and unlikely to produce false positives ("almost never includes false positives") - however they don't list any specificity, sensitivity, etc. that would be useful to evaluating that claim.

* The analysis does no statistical tests, no confidence intervals, minimal information about how the model was tested or validated.

* Critically: they note, but do not describe or quantify, that a lot of the criteria are highly correlated

* then later in the article they suddenly seem to switch to a 10 point scale for quality away from their 17 point scale? with a threshold of 3 or below as low quality?

* My personal twitter account meets most of the metrics where they have listed a quantifiable threshold. And their fake followers tool lists it as pretty f'ing suspicious - i.e., low quality.

I'm not saying there wrong but I am saying good luck getting this from a blog post to any sort of respectable science publication. As they note at the end, they aren't even calculating the same metric - twitter uses monetizable daily active users - remember NYtimes? Absolutely a monetizable account - even if it isn't a real person.

anyone who thinks this is proof of Elon's 4D chess based on this article is, to me, frankly delusional.

Turning on my cynicism switch on a bit. The author is a very good content marketer. A hot topic in our corner of the world — which is author’s target audience — is Elon Musk buying Twitter. Musk tweeted that the percentage of bots is the main issue of the deal. He disputed Twitter’s number of 5%.

I believe the author writing prompt was just: a headline about fake Twitter accounts showing a number significantly higher that 5%. That’s it. Whatever the methodology, that was the author’s goal.

The article achieved this goal. Otherwise is completely irrelevant. Even for the person who wrote it.

> The article achieved this goal. Otherwise is completely irrelevant. Even for the person who wrote it.

It's almost like Inception isn't it? A PR stunt within a PR stunt within a PR stunt.

My account was active until recently (deleted when Twitter accepted Musk's offer, I don't need to be a participant in a right wing cesspool). I have 0 tweets. I don't like things, because I don't want my name attached to someone else.