Hacker News new | ask | show | jobs
Show HN: gi.st – the gist of the web (thegi.st)
11 points by stkbach 3999 days ago
15 comments

The web is a big place. What's the strategy around seeding it with useful gists so it can gain critical mass? Have you thought about focusing on a niche (say, TED talks?) or possibly automatically summarizing pages? The latter could be tricky as incorrectly or poorly summarized pages would detract users from the site.
It does automatically summarize any text heavy articles.

For example:

bbc.com/news thewashingtonpost.com huffingtonpost.com reuters.com cnn.com theglobeandmail.com theguardian.com en.wikipedia.org

Go to any of those as a start and submit one of the articles you find and everything suggested by @gist was generated by an algorithm.

Our hope is that will help get the ball rolling. We are certainly very subject to network effects, so focusing on a niche and providing tremendous value to that one vertical is a good strategy.

Ah, your landing page should make that abundantly clear. That's a very cool feature.

I tried a couple of articles on Washington Post and the algorithm did a decent job. I then read the gist first before reading an article, and while some sentences were a little hard to understand w/o the surrounding context, I felt I still got a decent summary. I can see myself skimming the article through this service instead of relying solely on my eye balls when scanning through.

Couple of suggestions:

- Linking the extracted sentences back to the original page if I want the surrounding context - A tool/browser plug-in which can allow me to select a representative sentence from the story and submit to gist

Seems like a useful stand alone service as it is; having people vote on and submit gists themselves would be cherry on top.

Thanks! There is a browser plugin planned that will show you the gists of any url you navigate to, and additionally could show the gists of links on a page before you click on them...
Hey, seems neat, as a student it makes me wonder if it could help me as a search tool ... if it summarizes an article for me and 'the gist' is exactly what I need for a paper, can gist help me find other articles with similar 'gists' so I can use these in my paper?
The real purpose of gist is to help you decide what not to read. It's a better form of skimming.
A gist 'filter' or something would be easier, just on all the time, instead of cut and paste the url. easy if it were a pop up that kicks in automatically everytime you go to a new web site or hover over a link - could fade out after 5 seconds if not hovered over ?
Agreed granolagirl. Just a masking of the site should Gist be turned on, with a quick link or click away feature to the source.

First tests are pretty damn impressive though. Gist'd (is that even the right verb?) a couple of wiki pages and found the summary quite useful, and surprisingly, grammatically correct. Will stay tuned for sure.

:) it's grammatically correct because @gist uses an extraction based strategy. (ie, an existing representative sentence). If you want an abstract summary (ie, a new representative sentence), best to ask a human for now.
Yup. We'll release a chrome plugin shortly that provides the quick gist of whatever page you happen to be on, so as to avoid having to actually go to the site and submit a url. More convenient for the regular user.

In addition, we'll release the api so that people can build their own tools if they wish.

This will save me many hours trying to skim the goodness out of long reads. The chrome extension will be key for adoption.

I wonder what the feedback will be from big publications? —maybe they will take the hint and write more concise content...

This is awesome! No more TL;DR. The plugin or extension idea is also great and would be much easier to use and avoid the copy paste of urls. Congrats!!!
I like the concept and it would be great if we could personalize this app a bit more. For instance, I want “History” tab which shows all my gists.
Great work. Very cool. A progress indicator when the content is being parsed would be nice.
Be nice if there was a "Latest" tab or something that showed what had recently been gisted.
+1 on the list!
Do you have plans for giving it the capability of analyzing images as well as text?
In the distant future, but as of right now we're focused more on text and getting humans to start participating. The machine learning approaches we're using are getting better every day, and the latest deep learning techniques are showing promise with pictures, so that is certainly a possibility down the road.
It tots works! Super cool, I could see myself getting addicted to this tool.
It tots works! Super cool, I could see myself getting addicted to this tool.
I wish it didn't offer so many 'gists' - three is enough
Ya, it returns about 1/6 of the sentences in the documents as gists. For an item with 12 sentences, 2 is too few, but for something with 100 sentences, 17 is too many.

We'll probably change the ratio to scale in a smarter way rather than a constant fraction.

also, it seems to just be reiterating the Title of an article, can it delve into an article to summarize the content of the article despite what the author has titled it?
@telephoto...

goto one of the following sites: www.bbc.com/news, washingtonpost.com, huffintonpost.com, reuters.com, theglobeandmail.com

On those sites, find any text article that's medium+ size in length, and submit the article (not the home page of the news site) to gist.

You should see that it's not just reiterating the title.

On many sites it's quite buggy, and what you're seeing is likely the result of gist either a.) attempting to gist something when it shouldn't (like say bbc news home page), or b.) attempting to gist when the scrape/parse failed and there was too few or poorly parsed sentences.

We do extract the html title and stick it in at the top of the page as a reference, but that title, like each gist, is editable by the community, and serves only to act as a jumping off point.

Very nice. I'd be interested in playing around with the api.
Ya, the website shown here doesn't have privileged access to the api. So, short of some basic rate limiting, we can release the same api the website uses to the public as a start.
good start! :)