Hacker News new | ask | show | jobs
by adolph 1968 days ago
In what research contexts is API usage valid instead of scraping a view more similar to what people experience? If the Twitter site and API are retrospectively cleared of removed/suspended accounts with large impact, how does that affect retrospective studies?

Are there ethical implications of working with Twitter to gather data? Despite Twitter TOS, legal, IRB ok, are there informed consent issues in studying the artifacts of social media use?

2 comments

Until now, none I think. The API only gave a partial view while scraping offered all tweets for a particular search term. The scraper had to be clever to juke the anti scraping systems but you would get a more complete data set than using the API.

And the streaming API was terrible. Even if there was no data on the stream you could consume tens of gigabytes of bandwidth a day. Dreadful.

One easy example is language, for example tracking the spread of new words or other language constructs. You don’t care how the site looks, you care about the text that was previously input.