Hacker News new | ask | show | jobs
by petercooper 6429 days ago
For Ruby, consider Scrubyt: http://scrubyt.org/

If you're wondering why, well, consider this script that "learns" how to scrape Google results (from one supplied example of output data):

  google_data = Scrubyt::Extractor.define do
    fetch 'http://www.google.com/ncr'
    fill_textfield 'q', 'ruby'
    submit

    link "Ruby Programming Language" do
      url "href", :type => :attribute
    end

    next_page "Next", :limit => 2
  end

  puts google_data.to_xml
Reads almost like English in the scraping part!