Hacker News new | ask | show | jobs
by stingraycharles 2000 days ago
Isn’t this against the license of those sites, though? I’m pretty sure you’re absolutely not allowed to crawl Instagram content let alone mirror their videos.
2 comments

You implicitly give a license to crawlers if you don’t take action to block them via robots.txt or otherwise block them via your server. If you do either of these, google will respect the site’s decision and you probably could take them to court if they tried to evade blockers that block google bot (but since google always respects robots.txt and never craws from a different ASN or different user agent, even for safe browsing crawls, they’re fine).

So if Instagram wants to block google from downloading their videos, they can

  Disallow: /video/
(Or however their url scheme works)
Pretty sure crawling and scraping is legal even if there's a robots.txt.
As long as it’s public. If you need to bypass auth or similar, that’ll get you in trouble.

https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-l...

I would assume Google isn’t just gonna try to start needless fights with Facebook’s lawyers, so it’s likely legal.