Hacker News new | ask | show | jobs
by metadat 1443 days ago
While you are technically correct, these sorts of tarpit cycle patterns should be detected and shed by any half-decent crawling system ("half-decent" is said tongue-in-cheek :p).

Does marginalia struggle or fail to identify and sidestep indexing for these sorts of structures?

1 comments

I don't mean to suppose these structures are actually common, I'm using them to illustrate how the concept of the internet having a size falls apart when you start considering the practical matters of counting how big the internet is. It hinges on a model of files-on-a-server that isn't how websites have worked in the last 30-or-so years.

Counting websites, or even delineating where a website starts or ends is difficult, as you can on the one hand have a single server hosting infinite websites like I described. Services like cloudflare also throw a spanner in the works, if you think maybe using server IP would help. Domain name isn't much use either, as that would discount hosting services like neocities.

There's a similar fractal of weird cases with counting documents on a given webserver (and by the extension the internet).