Hacker News new | ask | show | jobs
by zkid18 693 days ago
Many would oppose the idea, but if any service (e.g. eBay, LinkedIn, Facebook) were to dump the snapshot to S3 every month, that could be a solution. You can't prevent scraping anyway.
4 comments

We publish a live stream of minutely updated OpenStreetMap data in ready do digest form on https://planet.openstreetmap.org/ and S3. Scraping of our data still happens.

Our S3 bucket is thankfully supported by the AWS Open Data Sponsorship Program.

Would the snapshot contain the same info ( beyound any doubt ) that an actual user would see if they opened LinkedIn/Facebook/Service from Canada on an IPhone at a saturday morning (for example)? If not, the snapshot is useles for some usecases and we are back to scraping.
Data from S3 isn't free though, still costs money and has a limit based on the tier you purchase.
Yeah, you can get dumps of Wikipedia and stackoverflow/stackexchange that way.

(Not sure if created by the admins or a 3rd party, but done once for many is better than overlapping individual efforts).