Hacker News new | ask | show | jobs
by alex_c 6647 days ago
Even though it's actually a testing tool, you might have some luck with Canoo Webtest + Groovy (http://webtest.canoo.com). Webtest uses HtmlUnit which has pretty good Javascript support, and means you don't have to mess with regexps to get around the document structure, and Groovy lets you use an actual programming language rather than the awkward Ant-based syntax of Webtest. It takes some getting used to, and I haven't used it for web scraping, but it's a pretty powerful combination.
1 comments

Thanks, I'll give it a try. I'm collaborating on a project which involves getting info from online financial markets, btw, but it's getting held up because of this scraping problem. So new ideas might help get it moving again.