|
|
|
|
|
by jondot
4974 days ago
|
|
Sorry, this isn't rocket science at all. Standard clustering algorithms (found in any off-the-shelf natural text processing library) and text summation with libots should suffice for most of the heavy lifting. http://tldr.it/
http://libots.sourceforge.net/ Further, most news articles' first paragraph is a practical (although you may have not noticed) summary. Coming from NLP, unless you can influence the source and the source being Web, the story should be an 80%-20% in the best case -- and you'll work VERY hard to correct the remaining 20%, and you WILL remain with a percentage of content you just can't summarize properly. What would make a difference is a real people-driven summation, not machines (see what voicebunny did for text-to-speech, for example). And yes, it would have been fun to combine the two as well. |
|
What I will be amazed is a good automatic summarization algorithm that is using abstraction and not just extraction.
Also, check out circa (http://cir.ca/). Never tried, but as I read, it uses both human and algorithm to "summarize" articles.