Hacker News new | ask | show | jobs
by goostavos 4935 days ago
In general, if you're going the mechanize route, .retrieve() is the function your looking for.

e.g.

  br = mechanize.Browser()
  br.retrieve("https://www.google.com/images/srpr/logo3w.png, google_logo.png)[0]
Mechanize doesn't really have a proper doc, but just about everything you'd need could be figured out from the very lengthy examples page on their site.
1 comments

Playing with it now, and while it seems to hit my download need, I can't seem to get it to play nice with sites that are JavaScript dependent. Am I missing something, or is there a way to plugin an underlying WebKit engine?
PhantomJS is capable of downloading binary content from js dependent sites but it is a journey to get it working as it is not an out-of-the-box feature. Instead use CasperJS to drive Phantom and get a ton of snazzy features including simple binary downloads. Happy scraping!