Hacker News new | ask | show | jobs
by Ledio 5854 days ago
Nutch has a full on web crawler, a lot of features, and it scales pretty well. You can white list or black list URLs as you see fit, and filter out unwanted content.