|
|
|
|
|
by throwawayyyz
4697 days ago
|
|
I love how the code is such a mess. You can really tell one guy just wrote this whole thing over the span of a decade... It's just one patch on top of another and the comments are pretty amusing. Also funny to see hardcoded algorithms for pre-defined site paths and whole domains such as facebook/myspace/vimeo. This is truly a makeshift search engine on a massive scale. EDIT: Gotta say, this has some very useful pieces of code. I'm working on a niche-specific crawler and am battling the url stripping/cleanup part of it. This is very useful: https://github.com/gigablast/open-source-search-engine/blob/... |
|
Unfortunately the document filter in questioning dose spawn child processes, so the normal way of using fork() and a monitoring process was not working. However using ulimit like this should work: https://github.com/gigablast/open-source-search-engine/blob/... . Hadn’t thought about spanning a new shell and let it have control like that :)
0: https://github.com/searchdaimon/enterprise-search