Hacker News new | ask | show | jobs
by capitainenemo 616 days ago
Personally it wasn't licenses or attribution that was the problem with ByteDance's scraping, it was that, unlike every other robot visiting our system they completely ignored robots.txt to the point of overloading systems.

Which is why their chunk of amazon asia is currently behind a ban.

I kinda feel like when people say "indiscriminate" they really mean it. There is no regard for courtesy or common sense.

1 comments

For someone who has the resources, I can think of a lot more fun things than a ban.

I think there was a story a while ago, possibly apocryphal, about someone who ran a disposable email service with a bunch of random looking domains, and they noticed bot traffic repeatedly hitting the page that shows one of their domains but not clicking through to actually activate an address. Guessing that the scraper was trying to find and block all these domains from being used to sign up for their services, the admin of the disposable email site added a function where if it detected bot traffic it would occasionally return domains like "gmail.com" in the text field.