Hacker News new | ask | show | jobs
by greenone 3057 days ago
thats like saying "having a public website is an invitation to DOS-attacks"

there are conventions and reasonable expectations, until now I did not expect that a tracking-pixel would be the basis for crawling, so far most crawlers tend to crawl whats publicly linked, not whats potentially publicly reachable if one knows every url there is

2 comments

Posting a file to a public web server is an implicit invitation for clients (human or automated) to download that file. That's why "secret urls" are universally considered to provide very little security.

There are common conventions (not always followed) around robots.txt and what files to crawl, but I'm not aware of any rules or conventions or standards around URL discovery. Plenty of crawlers attempt to crawl every registered domain name, for example.

"DOS Attack" is sort of a loaded term since it implies malice. Clearly running a web server doesn't mean you invite malicious attacks (though perhaps you should expect them). Some people consider Googlebot to be a DOS attack since it can easily bring poorly designed sites to their knees.

I watch my site's Google index and I can tell you 100% I never gave Google explicit permission to crawl 90% of the pages that show up there.