Hacker News new | ask | show | jobs
by rundigen12 3077 days ago
Re. blocking scrapers: Some of us are neither vast corporate espionage practicioners nor zombie-botnet users: we're on our own, scraping for data science & other academic research purposes.

Is there some way to declare, "I am a legitimate academic user", something akin to 'TSA Pre' status?

"Sure, register for & use the site's API," you'll say. What if they don't have one?

"Sure, just don't slam the server with too many requests in a short time," you'll say. But if they're rejecting you just because they detect you're headless, etc...?

1 comments

> But if they're rejecting you just because they detect you're headless, etc

Isn't that their right?

If I pay for my outgoing bandwidth (even if I don't) I am under no obligation to give my content/data/whatever to any third party source, even academic.

> If I pay for my outgoing bandwidth (even if I don't) I am under no obligation to give my content/data/whatever to any third party source, even academic.

Aren't you? You put a server on the publicly routable Internet. And made it talk over HTTP. At this point I believe you've already chosen to waive your rights not to serve content.

Isn't that the same argument regarding ripping music CDs? If I pay for the musicians, manufacturing and distribution costs to put a CD in stores, etc.

Although, I think you're framing it wrong, you're not obligated to give the content, someone is just choosing to consume it in a way you hadn't intended.

You're free to stop providing the content at all.

But as long as you're providing it publicly then it makes no sense that you'd be able to dictate how it's consumed.