|
|
|
|
|
by nl
5261 days ago
|
|
What package out there implements the algorithms for this, and is well-documented and trivial enough to use that a 14-year-old can understand them? Nutch[1]. Nutch doesn't deal with modern web spam particularly well, but I'd say it matched early Google pretty well. Specifically, it implements Page Rank, has a reliable web crawler and a web-scale data store. [1] http://nutch.apache.org/about.html |
|