|
|
|
|
|
by PuerkitoBio
4444 days ago
|
|
I've written a couple "polite" crawlers in Go (i.e. obeys robots.txt, delays between requests to the same host). - Fetchbot: https://github.com/PuerkitoBio/fetchbot Flexible, similar API to net/http (uses a Handler interface with a simple mux provided, supports middleware, etc.) - gocrawl: https://github.com/PuerkitoBio/gocrawl Higher-level, more framework than library. Coupled with goquery (https://github.com/PuerkitoBio/goquery ) to scrape the dom (well, the net/html nodes), this makes custom scrapers trivial to write. (sorry for the self-promoting comment, but this is quite on topic) edit: polite crawlers, not scrapers. |
|