For regular crawling:
I found anemone ( http://anemone.rubyforge.org/ ) to be a lovely framework for single page crawls.
Other interesting candidates:
https://github.com/hasmanydevelopers/RDaneel
http://www.redaelli.org/matteo-blog/projects/ebot/
http://nutch.apache.org/ (meh, java)