|
|
|
|
|
by dvanduzer
2428 days ago
|
|
A crawler has two high level options: parse the page, or render the page. Most of our parser-based crawling is done by Heritrix (crawler.archive.org) and most of our render-based crawling is done by a proxy-based recorder similar to what you theorize (https://github.com/internetarchive/brozzler). |
|