|
|
|
|
|
by rwill128
3349 days ago
|
|
I would like to chat for sure. I ran across Anki a few months ago, and thought about all the potentially incredible interactions between spaced repetition and Natural Language Processing techniques. For a super basic idea, imagine a Chrome plugin that creates an Anki deck out of a Wikipedia article at the click of a button? (Maybe not the most useful application, but pretty neat and a great place for a simple proof of concept.) And what if you took that a step farther and made an app that does the same thing with text extracted from an image set. So you could take pictures of a textbook page or pages and have it do the same thing with those. Of course, there are many practical challenges to developing the NLP process in a way that it can identify the most salient parts of a text and capture flash-card-friendly phrases or factual statements, but I think the challenges could be overcome. And it would be an amazing feat to capture the benefits of spaced repetition without the upfront cost of manual deck creation. |
|
I don't think NLP is necessary though.
If you take a look at the stackoverflow data dump sooner or later every possible variation of a question for a particular answer is going to get asked. Ofcourse there are always new topics that haven't been covered but this is a minuscule portion of the entire data set. I think it's a safe bet looking at Q&A happening on sites like Quora\Reddit etc in a couple years the chances that someone is going come up with a question that no one has asked before is going to be very low.
Wikipedia is missing 2 important pieces.
1. Linkage between all these questions and the content on the site. Currently Google provides this link.
2. A system to communicate to the reader what skill level /pre-reqs they need to fully appreciate/understand the content they are looking at. The UI for such a system already exists in most games.
Once these pieces are in place you are ready to create a very useful Anki deck on anything.
What most people don't realize is the entire mass of human knowledge given the size of most wiki/q&a site dump is about 100-150 GB. Throw in all the edu video content being produced on Khan Academy, NPTEL etc and you get to a 1-2TB. This isn't a big amount of data. All it needs is a learning system built on top of it for it all to be put to good use.