| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gambler 3354 days ago

I'm willing to bet no one* does their own search crawling aside from companies the size of Yandex and larger. Google carefully manipulated web standards to make sure you can't do that effectively without tons of upfront investment. You pretty much have to run a customized headless browser to get real content. And then you have to figure out how to interact with whatever you get, since increasing number of websites are SPAs. Google itself has it easy, since developers actively modify their site to fit Google's capabilities.

But hey, everything is "fine" as longs as the Web keeps a bunch of developers employed with six-digit salaries. They will put up with any amount of accidental complexity and ignore any effects on future innovation as long as their jobs are secure. (And those jobs are more secure than ever because you need ever increasing number of specialized professionals to keep the increasingly complex technology stacks operational.)

* One exception I know of: Web Archive. But their coverage is pretty spotty and they aren't strictly speaking a search engine. Still, it's an awesome effort. At lease someone tries to swim against the tide.

1 comments

ronsor 3354 days ago

My search engine crawls its own results. The downside is the index is very tiny, under 100,000 pages.

link