Hacker News new | ask | show | jobs
by marginalia_nu 1165 days ago
It wouldn't. But such islands are typically not very interesting either. The context of who links to a domain is very important for a search engine for many tasks, not just discovery.
1 comments

Very cool. Reason I ask is at first glance the header "Search the Internet" to me, implies you are searching the entire internet. It sounds like a more appropriate header would be "Search the obsecure Internet"
To be fair, no search engine lets you search the entire Internet, not even Google does this.

Internet arguably doesn't even have a size. You can construct a website that's like n.example.com/m which links to '(n+1).example.com/m' and 'n.example.com/(m+1)', for each m and n between 0 and 1e308.

I did it! For every two numbers, calc.shpakovsky.ru has a static(-looking) webpage showing their sum (or difference, etc). Together with links to several other pages. The only limitation I know of is 4k URL length. Interestingly enough, major search engines are rather smart about it and cooled down their indexing efforts after some time. Guess, I'm not the first one to make such a website.
Haha, nice! Crawler traps are a quite old phenomenon. Been around since before Google.

Dunno about the others, but my crawler has a set depth it will crawl. It'll BFS for like 1000-10000 documents depending on some factors.