Hacker News new | ask | show | jobs
Twitter improves API usage for researchers (blog.twitter.com)
117 points by jansenmac 1968 days ago
13 comments

We've been running our startup [1] in this "media research industry" for just a while. We're on the classic media use case side.

It is true that the vast majority of "research" is done by non-academics. Lots of companies doing market research want to mine media data.

Still, I believe that this "social media research" is a bit overvalued. There was this wave of "social media is the primary source where information appear". But now many realized how freaking difficult to separate this data from the noise comparing to traditional news published by journalists.

Also, take a look on this article [2] about how Dataminr sells insights from Twitter data to foreign governments (2017). Seems like just a way to punish the opposition channels.

[1] https://newscatcherapi.com/

[2] https://www.theverge.com/2017/1/27/14412014/dataminr-twitter...

It's a shame access to this API is limited to academic institutions, as many social media/misinformation researchers are now independent or affiliated with journalistic institutions.
Misinformation and troll farms on Twitter start with unrestricted API usage to the masses.
Misinformation & troll farms appear to do just fine with the current restricted API.

What evidence is there that read-only API access will make it significantly easier for them (enough to outweigh the other upsides)?

So you're telling me API's are not useful while at the same time telling me that API's are useful. Trolls want API access for the same reason researchers want access. You could argue research is the first step to becoming an effective troll.
I don't see API access as being necessary for a small-scale trolling operation, and large scale operations have enough resources to work around the lack of API by scraping.

I'm not saying that API access is completely useless, I was raising the question of whether the potential (and relatively small) benefit to trolling outweighs the major upsides of API access being available for all.

In what research contexts is API usage valid instead of scraping a view more similar to what people experience? If the Twitter site and API are retrospectively cleared of removed/suspended accounts with large impact, how does that affect retrospective studies?

Are there ethical implications of working with Twitter to gather data? Despite Twitter TOS, legal, IRB ok, are there informed consent issues in studying the artifacts of social media use?

Until now, none I think. The API only gave a partial view while scraping offered all tweets for a particular search term. The scraper had to be clever to juke the anti scraping systems but you would get a more complete data set than using the API.

And the streaming API was terrible. Even if there was no data on the stream you could consume tens of gigabytes of bandwidth a day. Dreadful.

One easy example is language, for example tracking the spread of new words or other language constructs. You don’t care how the site looks, you care about the text that was previously input.
I wonder how are they going to enforce their rules, e.g. non-commercial use. I assume this will require some monitoring to be effective. Large scale Twitter API access is typically pricey, malicious actors might try to buy or steal researcher's credentials to cut costs.
I recently tried to sign up for Twitter API and the process is nothing like what it used to be. You have to give them a lot of information to even qualify, such as what you're going to use it for. It used to be that those were just some fields you need to fill out and you could sign up immediately. But nowadays the application process requires a direct approval from their team, which means they're monitoring every API account like Apple does with their app store. And if you like about your usage you are probably liable
> You are either a master’s student, doctoral candidate, post-doc, faculty, or research-focused employee at an academic institution or university.

This is gross. Rather than using the internet as a democratizing force for education, they restrict the program to those already inside credential-granting institutions. So much great research has been done from outside the institution and yet Twitter is actively pushing outsiders to resort to scraping.

Is it me or is Twitter making a lot of announcements this month?
Twitter makes it easier for Researchers to use tweets and the Twitter API for research.
basing on those UI screens, then why research is always associated with academia?

there's a lot of strong people especially in CS who do not work with academia and still work on interesting stuff

Absolutely. The limitation to people associated with academic institutions is pretty old fashioned (and also, from another point of view, modern).
Because it's easy for Twitter to draw the line between commercial and non-commercial use?
Maybe it has to do with control. Student==likely low impact. Faculty==likely uncontroversial. API==looks like transparency.
I always wondered: how many of these tweets are just Justin Bieber fandom type posts, bots, spam, or other dross? Twitter is infamous for its bad signal to noise ratio. These researchers need to write algos to filter out all the noise
What's wrong with that? If you wanted to investigate, say, the rise of The Beatles - wouldn't you love to have access to the random thoughts of their fans in the 1960s?

Similarly, if you're researching bots and spam and how they manipulate people & markets - this is still useful.

Are there researchers out there using bot accounts to prospectively experiment with social media researchers?
A lot! I ran named entity recognition on the Twitter garden hose back in 2012. The top entities were Bieber and the Jonas brothers.
Is it better then the previous time that they changed it?
Researchers like Joan Donovan?
Could have saved you some work, the research results are that Twitter is very liberal, less conservative and nobody has seen a libertarian for days.
Where's your sources, numbers, methodology? I mean anyone can make a kneejerk statement based on their perception (read: bubble), but that's not science.
You should follow different people.
I'd think libertarians would love twitter, deplatforming is the free market at work and the government has no right to make them do business with anyone they choose not to
Twitter should be judged by the way it governs its platform. And from libertarian perspective it's governed poorly. Sure, under current laws they can get away with deplatforming in the way they do now, but there is nothing commendable or desirable about it for libertarians specifically.
And one might think liberals, putting liberty above order, would be aghast at silencing opponents to maintain order, and yet here we are. Our political theater has gotten pretty weird. The whole thing is looking more and more like unprincipled tribes vying for power.
"silencing opponents"

Nobody here is silenced. You do not have a right to a twitter account. People can, and do, make accounts elsewhere.

Literally anything Trump does can and is covered by the media; this is exactly the opposite of being "silenced".

I was not talking about Twitter accounts. I do not care what Donald Trump does or does not do.
Silence is a strong word -- I'm certainly still able to hear news from Donald Trump, even if he was banned from Twitter.
As someone that doesn't try I still hear news about Trump but no longer from Trump. That's a big (welcome) change.
You're right about Trump. Most people don't have the platform he has.

Many dissenting voices have been removed from the conversation, and you wouldn't even know they are missing.

Except I keep hearing about it constantly, everywhere, including Twitter. And yet I haven't seen a single one being banned for discussing fiscal policy, less regulation, conservative views on social programs etc. "Conservative voices [being] silenced" are almost always some variation of spamming evidently real-world damaging conspiracy theories or clear ToS violations.
you would think from the amount of republican/conservative accounts they have banned that they have some AI or parser dedicated to banning these types of voices