Hacker News new | ask | show | jobs
Is it legal and possible to scrape the social media platforms?
13 points by iamnnk 658 days ago
Given links to posts, is it legal & possible to scrape from social media such as YT, FB, Insta, TikTok & Snap?

If yes, do they block beyond a certain number of hits? Is there a paid route to doing this at scale?

Are there popular libraries/packages that let us do this, that are current with T&C of these platforms?

7 comments

Most of the social media companies are scraping everything to train their LLMs. I think we’ll see some court decisions soon regarding legality.

Some of the social platforms have APIs you can pay to access. Some have aggressive anti-scraping countermeasures.

HiQ vs Linked in determined this. If the content is available without a login, it's fair game. If there's a login required, then it's not. That's why Twitter now requires a login to view extended content.
Legal, yes, as long as you are not accessing stuff you are not supposed to.

Possible, very much so, just depends on the platform and the rate of access that they allow. Some platforms will basically rate limit hard if they detect a lot of traffic from a single IP.

With paid API access, you may have a higher rate available, and an easier time getting the data (usually without you have to parse HTML)

Generally speaking, if you're not logged in and nobody has told you to stop, you should be ok.

There is a service called SerpAPI that provides an API around stuff you might scrape. Haven't tried it myself but heard its good.

This question is way too broad. What is your purpose? What specifically are you scraping, (ie images, text, audio, video)? Please expand
I want to scrape all possible items of a post: video or photo when that's there, the caption, the hashtags, location tag.

If all aren't possible, which are? Want to let my users derive searchable intelligence off this info - intent is commercial.

I'm not a professional but it's depends too the social media (like scrapping Instagram it's not the same to scrapping X). Also social media fighting Instagram (blocking if to many requests are made) and they change their HTML tags. But like I said I'm not a professional in scrapping and I can make some mistakes.
Approaching the platforms adversarially makes you an adversary. This might not be a solid foundation for a stable business.

Your lawyer is the best opinion regarding legality.

Good luck.

they's already sscraped bro

they's on the archive dot org

that site has everything but their search is shite

so to find scraped things that they scraped, you need to scrape their site and build a non-broken search engine for yourself

but you'll find your post-scraped social media sites

and many other interdasting things