Hacker News new | ask | show | jobs
by DanBC 4790 days ago
This is neat!

The article gives an example which I find a bit confusing.

>I ran it on this sentence -

> “Swayy is a beautiful new dashboard for discovering and curating online content.”

>And got this result -

> This sentence is about: Swayy, beautiful new dashboard, online content

That misses "discovering" and "curating", which I think are the most important parts of that sentence.

2 comments

Nah, it filtered out the meaningless buzzwords.

It's a dashboard for online content. That pretty much implies the picking and finding of said online content to be displayed on the dashboard.

Huh? "Curating" and "discovering" may be overused tech verbs, but they are vital in describing what the "dashboard" does. For example, you would never describe the Google Analytics dashboard as something that curates or discovers.

And far worse than buzzwords are adjectives. Does "beautiful" add anything to that sentence?

I disagree. Dashboards curate by definition. Some car dashboards display have a tachometer, some don't it depends on the car's focus. Google Analytics similarly doesn't display all available information, just that which will be useful. Dashboards always display a limited subset of available information that makes sense to the current context aka curated.

Discovering is also vacuous. The whole point of a dashboard is to convey information. Do I discover how fast I am going by looking at my car's dashboard? Do I discover my website's traffic by going to Google analytics? Sure, I wouldn't use them if I didn't get the information I need from them. So using an online content dashboard that doesn't deliver online content of some sort would be a waste of time.

Beautiful adds something because not all dashboards are beautiful.

Google Analytics has nothing whatsoever to do with online content.
You're missing the point. The OP is talking about a system for interpreting sentences in bulk and extracting useful keywords. "beautiful new" are not useful, and arguably, "dashboard" is not particularly useful. "Curating" and "discovering", while grating to our ears, are definitely descriptive words of purpose...because there are "dashboards" that have nothing to do with "curating"...so ostensibly, "curating" has some use as a keyword
This is because he is only extracting the noun phrases from the sentence. If you adapted his code to tag verb phrases as well (by modifying the semi-CFG and the normalize_tags method) then you could also extract "discovering" and "curating" as well.
But this would miss the "main topics," since when you have both the vp's and the np's you have everything :/ Here is the resulting tree (it's unformatted, sorry, I tweaked an old Prolog grammar I had for analysing search keywords and tweets):

[[[[Swayy,snp],np],[is,[a,[[beautiful,new,dashboard,snp],np],np_],vp],simple_s],for,[[discovering,simple_s],and,[[curating,[[online,content,snp],np],vp],simple_s],s],s]