Hacker News new | ask | show | jobs
by a_chris 1538 days ago
Can anyone explain to me how these services are legal? I didn't read Instagram terms and conditions but I'm pretty sure there are tons of points against scraping, copying and distributing their data, in particular using them to make money.

How is this possible?

3 comments

I don't follow this topic closely but it is definitely in a legal grey area and under frequent debate (and lawsuits).

To highly summarize...

A frequent allegation is that this is unauthorized access of computer systems. The scrapers argue that this is public data so they are just accessing it. Their access isn't meaningfully different from regular users which are allowed. From their point of view if the service doesn't want to share the data they shouldn't make it available.

Another common accusation is breaching the ToS. Generally the defense is that they didn't agree to any contract.

A last effort is some sort of copyright. Generally the scrapers will argue that that the data can't be copyrighted, isn't owned by the service or that some sort of license was given (back to the public data argument).

Of course every case is different and has different points but these are the common ones that I have seen.

Potentially useful reference regarding the status of one of the most important such lawsuits:

https://news.bloomberglaw.com/us-law-week/supreme-court-scra...

yeah post linkedin, it gave the green light to scrape any publicly available information. Craiglist bullied scrapers via lawsuit (EFF covered it) but post linkedin, there has been zero grounds for Craigslist to use the DDOS argument (since the website is built to handle far more traffic than scrapers can).
Breach of ToS has nothing to do with legality. It's definitely a breach of ToS, but legality will depend on the local jurisdiction, and enforcement will depend on whether the user is in reach of a legal system that cares about it (good luck when the user is anonymous or based in Russia or other US-unfriendly country).
The simple answer is: this is not legal and also doesn't work at scale. Try running this type of scaling for a few thousand profiles - you will quickly be restricted.
It's definitely a breach of ToS, but I wouldn't be so fast at calling it illegal. It's a grey area that has yet to be properly litigated - I think the closest we've got is the LinkedIn scraping case and I don't remember whether that one even reached a conclusive answer.

In fact this is one of the downsides of the US legal system - litigation is so expensive that nobody dares trying it even though it could set a legal precedent that would benefit society at large. This is IMO something a consumer-friendly regulatory environment (such as the EU) should settle in advance like with the GDPR for example, but given they're not even bothered to enforce that effectively, I don't have much hope (if they enforced it, it would actually remove a big use-case for scraping Instagram, as you would be able to use the official clients without compromising your privacy).

AFAIK the latest status of the LinkedIn case is still inclusive (due to the Supreme Court stepping in).

https://news.bloomberglaw.com/us-law-week/supreme-court-scra...

You are wrong. This is not illegal. With an 4g/LTE proxy machine you can easily generate thousands of profiles rapidly and cheaply. They would be able to detect them at some point (will be harder if goes slowly) but it wouldn't stop the scraping.

The only way is for Instagram to restrict registration altogether, but you might create a black market where existing users sell their accounts, and cannabilize its own userbase (Bad for meta stock prices).

I may be wrong about this being illegal (depending on the country you reside in), but it is certainly not an approach that scales. Meta/Instagram have multiple teams dedicated to preventing this type of scraping. Unless you're willing to invest an equivalent level of resources, any success in scraping Instagram data will be temporary.
> it is certainly not an approach that scales

If there's demand for their service I don't see why it wouldn't scale. Get more phones, more SIM cards, and have automation around all this infrastructure to automate away as much of the stuff as possible.

> Meta/Instagram have multiple teams dedicated to preventing this type of scraping

That's great but ultimately they still have a weakness: they want people to be able to see their stuff - at least some of it - without logging in. As long as you can either simulate a normal device perfectly, or even better, use real devices or virtualize them, there isn't much that Facebook can do without impacting legitimate usage which they don't want.