Ask HN: Does anyone have experience scraping dynamic content?

Y	Hacker News new \| ask \| show \| jobs

3 points by sid_viswanathan 5051 days ago

http://www.nfl.com/scores

For example, for this URL, there is a section on the right for "Big Play Highlights" and the first one listed is "L.Brown 7-yard TD pass from..."

It looks like this data is being loaded via some kind of AJAX call. Do you have any ideas on how I can scrape this stream of highlights data? I've never tried to scrape any dynamic content in the past.

Ideas?

4 comments

AznHisoka 5048 days ago

PhantomJS/CasperJS is what I've used for my current scraping projects. They're headless browsers and imitate a browser session. Just specify a fake user agent like Mozilla, and you're good to go.

link

bartonfink 5051 days ago

You MAY be able to do this with Selenium, although I've never used Selenium to scrape streaming requests. It's going to depend quite a bit on how the page is structured.

I have used Selenium to scrape dynamic content before by waiting for new DOM elements to be populated by AJAX, so I know this sort of thing is possible in a way you couldn't do with wget.

link

jfaucett 5051 days ago

yea, just look at the http headers, here's the call: http://www.nfl.com/liveupdate/scores/bigPlayVideos.json?rand...

just make your own tstamp for random

link

bonsai 5047 days ago

HtmlUnit has very good support for javascript. http://htmlunit.sourceforge.net/

link