Hacker News new | ask | show | jobs
by doh 2114 days ago
Can you talk more about the specific? What kind of parsers did you guys use? How about storage? How often did you update pages?
1 comments

You should check out Manning's "Introduction to Information Retrieval", it has far more detail about web crawler architecture than I can write in a post, and served as a blueprint for much of Applebot's early design decisions.
Nice, thanks for the recommendation!

The book is freely available online at https://nlp.stanford.edu/IR-book/information-retrieval-book....