Hacker News new | ask | show | jobs
by gojomo 6486 days ago
Thanks for the plug!

As a developer of Heritrix, I can't honestly say it's compact or Python, but it is well-behaved, highly customizable (both by settings and by many Java extension points), and capable of high-volume crawling for many purposes.

You could also embed Python code via Jython with a little work, if necessary.