Hacker News new | ask | show | jobs
by clamprecht 3302 days ago
Put the Terms and Conditions (the part relevant to scraping) in the /robots.txt as well.
1 comments

Yes. Did that after this episode.
Were you seriously expecting bots to read your T&C? Or anyone, for that matter? Did you mention that it was okay for Google to scrape your site?
We're not talking generic "bots".

We're talking a custom scraper written for this site and this site only.

Yes, I am expecting the people who spend hours inspecting the source of my site, and then writing a custom scraper for it, to spend 30 seconds reading the T&Cs first.

Not sure why you'd expect that. If my webbrowser can download your source code, my software will as well.

If you want people to read it put your content behind a sign up with a checkbox.

It is _already_ behind a sign-up with a checkbox. They scraped their way past that too.
How? (Seriously, how does one do this?)
Ah, that changes things somewhat.