| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by adinosaur123 1644 days ago

I don't wish to hijack this thread, but I've been pondering a similar question. I've been working on a product that requires a very large amount of data that, as far as I can tell, can only be gathered by scraping (real estate data - even data vendors like estated.com don't have stuff like sales data).

Many, many websites contain legal language that forbids automatic data collection/scraping. How can a business be built in such a case?

Perhaps OPs tool only scrapes a select few sites that don't prohibit scraping, but that seems like the exception, not the norm.

2 comments

GhettoComputers 1644 days ago

Do it manually. I wonder what automatically means legally or scraping. It’s pretty hard to enforce those requirements, because I assume it’s being broken by search providers.

If it’s a derivative work like copilot, I wonder if there’s a legal case to say you can’t do it. I assume you’re doing something like an RSS feed for pricing suggestions with commissions? I just looked this up and it seems like it’s legal to do so but their information is copyrighted. https://law.stackexchange.com/questions/15556/is-scraping-re...

link

nlh 1644 days ago

Read up on LinkedIn vs. HiQ. As long as that ruling holds (and it might not), the tl;dr is: If it's on the open web, you can scrape it. You might be violating some Terms of Service (that you never agreed to), but you're not violating (US) law.

If it's NOT on the public web - e.g. it's behind a login, then you can be sued, as you'll have had to explicitly agree to Terms of Service during your account creation and you'll then be in explicit violation of that ToS.

link