Hacker News new | ask | show | jobs
by whakim 1597 days ago
It really depends. There are plenty of legitimate uses for scraping (for example, I've been involved with academic research that involved scraping Twitter search results), and it's only really feasible to collect the amount of data you need using scraping plus paid proxies. That being said, there are also a number of nefarious paid proxy services which offer residential IPs (read: are usually botnets).
1 comments

What is legitimate to a user is not the same as what is legitimate to a site owner. The legitimate way would probably be to use the Twitter API.
The Twitter API has very low rate limits (from a data collection perspective). While there may be good reasons for that, these limits also preclude doing public interest research of the type we were doing (how Twitter's various search filters influence the political leanings of search results). When companies have Twitter's level of societal influence, I think it's also possible to define "legitimate use" in terms of public interest, rather than simply "users" or "site owners."