|
15 years ago, I could reasonably write a search engine. Myself. 1 person. In a few weeks (modulo bandwidth and server farm). I write a program that grabs a web page, and reads out keywords. Today, if I grab a web page, quite often, that web page has nothing except for JavaScript code. That code grabs the actual content from the server, lays it out, and animates it. To write a web search engine, I need to write a complete JavaScript library. At the time, we were talking about developing all sorts of agents. Things that would shop for you. Things that would find parts for you. Thinks that would remember what web sites you visited, and let you search them. Things that would track where in a long set of pages you were (blog, comic, etc.), and let you keep reading from there. It happened for a while, and then it died when the web became too damn hard. Writing anything that can reasonably see and parse web pages now takes many, many web years. There are only four or five organizations with that kind of resources (WebKit, Mozilla, Opera, IE, and internally, Google). There are countless things we just didn't even imagine. It's like the DMCA. You notice all the innovations that happen, but you miss all the innovations it made impossible. |
No, 15 years ago you could reasonably write a search engine for 15 years ago. It would suck by today's standards.
You want to handle Javascript? Easy! There are plenty of tools to choose from now. Run a browser as your crawler, visit the sites, and read the generated source instead of the static source. Shove that into your 15-years-ago search engine, and there's no difference.
>Things that would track where in a long set of pages you were
You mean bookmarks? Add a scroll %, assuming they're not nice enough to use anchor tags / IDs meaningfully, and you're golden.
>Writing anything that can reasonably see and parse web pages...
has become a community effort, instead of a bunch of isolated silos where people reinvented the wheel out of necessity.
The resources required aren't so large just because it's so much more complex, it's large because it's so much faster, and you won't survive if you can't compete. How long did we languish with crappy Javascript engines? How much would you need to know to actively compete in that section alone now? It's easy to make a slow-but-functional browser, and if you looked around you'd see some people doing just that. Making a fast-and-resilient one is as hard as making a fast-and-resilient anything, especially where human input (ie, HTML) is expected to be consumed.