http://coursera.org is creating some fantastic, free educational videos (algorithms, machine learning, natural language processing, SaaS).
This script allows one to batch download videos for a Coursera class. Given a class name and related cookie file, it scrapes the course listing page to get the week and class names, and then downloads the related videos into appropriately named files and directories.
Why is this helpful? Before I was using wget, but I had the following problems:
1. Video names have a number in them, but this does not correspond to the
actual order. Manually renaming them is a pain.
2. Using names from the syllabus page provides more informative names.
3. Using a wget in a forloop picks up extra videos which are not posted/linked,
and these are sometimes duplicates.
Naming is intentionally verbose, so that it will display and sort properly using MX Video on my Andriod phone.
Inspired in part by youtube-dl (http://rg3.github.com/youtube-dl) by which I've downloaded many other good videos such those from Khan Academy.
Awesome! I was actually planning on writing such a script over the weekend. I haven't take a look at this semester's courses, but I know last semester the quizzes and tests were quite useful for someone with no previous practice in the subject at hand. I can see your script doesn't try to get all that right?
In the NLP class there are programming assignments with special formatting, headers, etc. I kind of want to write a script that uses NLP to snag NLP's programming instruction pages (as well as example code, etc.) Seems like that would be fun to do.
But in that case wouldn't you be looking to get the essence, the plain text useful stuff of an HTML document, in which case wouldn't parsing using regular expressionism or something be better than NLP? I haven't really done scraping and parsing of documents/text so I'm not too sure.
It's possible yeah, though I like the formatting and highlighting and borders etc, it groups the different sections of the instructions together.
I see what you mean though, it's not really full NLP either way, I just used that term in place of regular expressions because it was in the NLP class that I learned about them (first homework is a phone and email scraper.) Probably my fault for using semantics wrong.
Some shameless self-promotion: I wrote a Chrome extension for downloading Udacity videos (http://nzmsv.github.com/udacity-dl/). If there's any interest in a batch version I could look into it. Alternatively, feel free to write it and let me know :)
You can also check my script over here: https://github.com/fvieira/coursera_resources_downloader
It has the advantage of not requiring a cookies file, it can authenticate with your user and password.
Otherwise, it does pretty much the same as jplehmann's script, although with some minor changes which you might or might not like.
By the way, congratulations on your script, jplehmann! Wish I had found yours before losing time doing mine...
I'm already using it, after sometime I got a connection forcibly closed by remote host error. I can't access the Coursera website either, not sure why though. (mayhaps a bunch of people suddenly using this script crashed their servers? or they blocked us)
It's back up, must have been a small glitch. Might I add that I love the fact the script picked up on the video I dropped earlier.
Right now, if you kill the script it will remove any file being currently downloaded to remove partials. I'm not sure if that happens for other failure conditions. I have added an issue for this: https://github.com/jplehmann/coursera/issues/1
Nice! I actually found your project last week through google but wrote my own in js (https://gist.github.com/2225519) after struggling with the python dependencies.
I think coursera really needs to come out with a native solution and a standard way of numbering/organizing videos.
Thank heavens! Er, I mean thank you (the OP) for this tool. I already wrote scripts that renamed files to something sane, but this will make my life so much easier.
This script allows one to batch download videos for a Coursera class. Given a class name and related cookie file, it scrapes the course listing page to get the week and class names, and then downloads the related videos into appropriately named files and directories.
Why is this helpful? Before I was using wget, but I had the following problems:
Naming is intentionally verbose, so that it will display and sort properly using MX Video on my Andriod phone.Inspired in part by youtube-dl (http://rg3.github.com/youtube-dl) by which I've downloaded many other good videos such those from Khan Academy.
Let me know if you like it.