|
|
|
|
|
by hansvm
2017 days ago
|
|
(2) Is a great idea I hadn't considered. A surprising number of sites require "browser" user-agents but otherwise have well-defined rate limits, robots.txt files, and everything you'd need to write a respectful crawler. I'm not sure that (4) matters for larger sites? Their rate limits are usually a drop in the bucket compared to the background traffic. |
|
Generally, though, unless you screw up badly, submit forms, or blend in with a more problematic crawler, nobody’s going to care (or even notice).