|
|
|
|
|
by bityard
505 days ago
|
|
Please put a priority on making it hard to abuse the web with your tool. At a _bare_ minimum, that means obeying robot.txt and NOT crawling a site that doesn't want to be crawled. And there should not be an option to override that. It goes without saying that you should not allow users to make hundreds or thousands of "blind" parallel requests as these tend to have the effect of DoSing sites that are being hosted on modest hardware. You should also be measuring response times and throttling your requests accordingly. If a website issues a response code or other signal that you are hitting it too fast or too often, slow down. I say this because since around the start of the new year, AI bots have been ravaging what's left of the open web and causing REAL stress and problems for admins of small and mid-sized websites and their human visitors: https://www.heise.de/en/news/AI-bots-paralyze-Linux-news-sit... |
|
The comparison to DRM makes sense. Gimping software to disempower the end user based on the desires of content publishers. There's even probably a valid syllogism that could make you bite the bullet on browsers forcing you to render ads.