Hacker News new | ask | show | jobs
by MojoJolo 4974 days ago
I experienced an article in TheVerge which is mainly a video as its content.

What I will be amazed is a good automatic summarization algorithm that is using abstraction and not just extraction.

Also, check out circa (http://cir.ca/). Never tried, but as I read, it uses both human and algorithm to "summarize" articles.

1 comments

circa is a good idea, actually. From my close experience with this field, when a news article will be published it will be edited and republished many times, over many forms and shapes (Web, RSS, etc.) in many of these steps, a manual, human work is needed -- and this affects the volumes of the published news.

Further, many of the news really originate from relatively limited sources (reuters, etc), so you can plug your solution there as well.

Therefore it should be OK to assume that if you put humans at the same pipeline to summarize news manually, the capacity and efficiency will be reasonable.

The problem in summarizing news manually is that it takes too much effort for a human to do it. The efficiency may be good, but as many news pass by, his efficiency will go down. (assuming that he's only the one summarizing)
True, but my point is people are already doing it at the start of the pipeline. Think what happens when Reuters decide to make a SaaS offering of their summarized content. Even regardless of that, you can hire a battery of professional summarizers instead of PHDs and do it pretty well.

Where this doesn't apply, and where I do think you're completely right is non-news articles: think blogs, tweets (although there's not much to summarize in 140chars), product descriptions, scientific articles, etc. These things are produced in much more volume and much less workflow around them.