Hacker News new | ask | show | jobs
by jmyeet 1448 days ago
I'm torn on Web scraping because the extreme of each end of the spectrum on this issue both seem unreasonable.

On one side, you have people who say any form of scraping is be disallowed, even prosecutable. This went so far that the Department of Justice on behalf of AT&T prosecuted a case of URL modification [1]. One of the few bright spots for this psychotic Supreme Court was to curtail the government's power under the CFAA by limiting what constituted "unauthorized" access [2].

On the other hand, there are those who think that any level of scraping should be fine and I think that's untenable too. Consider Yahoo indexing of Stack Overflow [3]:

> In the meantime, since Yahoo (via Slurp!) is about 0.3% of our traffic, but insists on rudely consuming a huge chunk of our prime-time bandwidth, they’re getting IP banned and blocked.

Do these "scraping extremists" think such actions should be illegal? It's actually not that far-fetched given the Ninth Circuit decided LinkedIn wrongly blocked HiQ scraping [4]. Like if you change your website with the intent that it'll make scraping more difficult, is that a problem? What if it's an unintended side effect?

Additionally, companies like Meta, Google and Apple are going to be way more acountable to abiding by data retention laws and regulations than any scraper. If it's OK to scrape FB.com completely, that information is out there forever.

I certainly think the government shouldn't prosecute on behalf of companies. At least that should expose to people how the government's #1 priority is in fact to protect the true constituents: corporations and the capital-owning class.

[1]: https://www.techdirt.com/2013/09/30/dojs-insane-argument-aga...

[2]: https://en.wikipedia.org/wiki/Van_Buren_v._United_States

[3]: https://stackoverflow.blog/2009/06/16/the-perfect-web-spider...

[4]: https://blog.ericgoldman.org/archives/2019/09/ninth-circuit-...

1 comments

> So much about this case is ridiculous, and it’s complicated by the fact that nearly everyone agrees that weev is a world-class jerk. But, you need to separate that out from the details of what he did here, to note that it was nothing particularly special, and it involved the sort of thing that security researchers do all the time, and which all sorts of non-security researchers do quite often.

Yeah... uhm... I used to do exactly this sort of thing...

When I was a teenager, I would look at the URL of whatever site I was on, and would change a number here, or a letter there; and see what I got.

Sometimes you get nothing, sometimes you get something. Sometimes that something is quite interesting.