Hacker News new | ask | show | jobs
by jplehmann 5197 days ago
http://coursera.org is creating some fantastic, free educational videos (algorithms, machine learning, natural language processing, SaaS).

This script allows one to batch download videos for a Coursera class. Given a class name and related cookie file, it scrapes the course listing page to get the week and class names, and then downloads the related videos into appropriately named files and directories.

Why is this helpful? Before I was using wget, but I had the following problems:

  1. Video names have a number in them, but this does not correspond to the
     actual order.  Manually renaming them is a pain.
  2. Using names from the syllabus page provides more informative names.
  3. Using a wget in a forloop picks up extra videos which are not posted/linked,
     and these are sometimes duplicates.
Naming is intentionally verbose, so that it will display and sort properly using MX Video on my Andriod phone.

Inspired in part by youtube-dl (http://rg3.github.com/youtube-dl) by which I've downloaded many other good videos such those from Khan Academy.

Let me know if you like it.

1 comments

Awesome! I was actually planning on writing such a script over the weekend. I haven't take a look at this semester's courses, but I know last semester the quizzes and tests were quite useful for someone with no previous practice in the subject at hand. I can see your script doesn't try to get all that right?

In that case i'll still have a weekend project.

In the NLP class there are programming assignments with special formatting, headers, etc. I kind of want to write a script that uses NLP to snag NLP's programming instruction pages (as well as example code, etc.) Seems like that would be fun to do.
But in that case wouldn't you be looking to get the essence, the plain text useful stuff of an HTML document, in which case wouldn't parsing using regular expressionism or something be better than NLP? I haven't really done scraping and parsing of documents/text so I'm not too sure.
It's possible yeah, though I like the formatting and highlighting and borders etc, it groups the different sections of the instructions together.

I see what you mean though, it's not really full NLP either way, I just used that term in place of regular expressions because it was in the NLP class that I learned about them (first homework is a phone and email scraper.) Probably my fault for using semantics wrong.

Only support for videos right now.
Update -- now downloads all lecture materials on the videos page (pptx, pdf, etc).