|
|
|
|
|
by greenone
3057 days ago
|
|
thats like saying "having a public website is an invitation to DOS-attacks" there are conventions and reasonable expectations, until now I did not expect that a tracking-pixel would be the basis for crawling, so far most crawlers tend to crawl whats publicly linked, not whats potentially publicly reachable if one knows every url there is |
|
There are common conventions (not always followed) around robots.txt and what files to crawl, but I'm not aware of any rules or conventions or standards around URL discovery. Plenty of crawlers attempt to crawl every registered domain name, for example.
"DOS Attack" is sort of a loaded term since it implies malice. Clearly running a web server doesn't mean you invite malicious attacks (though perhaps you should expect them). Some people consider Googlebot to be a DOS attack since it can easily bring poorly designed sites to their knees.