Hacker News new | ask | show | jobs
by jdvolz 6646 days ago
I've written a lot of this sort of program over the last 18 months. This is something that people are in need of all the time. I would say that there isn't yet a tool which does this to the level that customers want.

I use Mechanize, both in its Ruby and Python forms (I prefer Ruby) and plain old regular expressions to get the information that I want. Often times I will use a divide and conquer strategy by removing part of the web page (for example, the <head>) and successively paring it down to what I really want.

Javascript can be a problem. What I normally do is actually read the Javascript on the page, and then recreate that behavior in my Ruby code. Often times this means simply setting some form values (usually hidden) and then submitting the form.