Hacker News new | ask | show | jobs
by bgraves 3238 days ago
Saw a pretty sweet demo of Tableau's home-grown prototype at their annual conference last November. It was surprisingly useful to be able to just speak "show me all of the 3-bedroom homes in the downtown Seattle area less than $400,000".

It was slow, but effective. I kept feeling myself wanting to click around for the first few minutes but quickly realized I didn't need to.

I did have to speak in away that the NLP engine could understand (i.e. "four-hundred thousand dollars" instead of "four-hundred k") so it still feels like I'm building a SQL query with my voice instead of just speaking an idea and the software figures out what I mean (hard problem to solve, I know!)

2 comments

> "show me all of the 3-bedroom homes in the downtown Seattle area less than $400,000"

Ha, that's an easy one. Empty result set!

Is there an open source NLP engine out there? I've been trying to learn this area and there are so many "pot holes" and wrong paths ... I've looked at OWL/Sparql, Graph DBs, logic programming, rule based systems. I feel like I'm dancing around the real topic and I don't know what "it" is :'(
I was playing with this around for a weekend or two. So my knowledge is not exhaustive on that matter but it all boiled down to having a good OLAP-ish data source in the first place.

- You can do the Named Entity Tagging based on the categorical data (e.g. columns that are Text/Strings with low-ish relative cardinality would make good candidates to filter out text fields with for example email addresses (which shouldn't be in a DWH in the first place as categoricals))

- FLOATs/decimals/Integers would be good candidates for values that somebody looks for (and the name of the column would be the 'trigger' of the query.

All in all, with a bit of logic, good OLAP design and a lot of up front configuration I got in a weekends time to answer basic questions like 'revenue in the US in 2016' using NLTK back in the day. Today I would probably give spaCy a try as NLP engine.

NLTK in Python - it even has a basic example showing natural language to SQL translation that is pretty cool. Simplistic, but a good starting point for learning!

http://www.nltk.org/book/ch10.html

.NET has some speech synthesis and listener libraries. They work pretty well; I built a modestly-functional chat bot once with them. Not sure about the overall .NET licensing arrangement, but I heard it was moving towards open source.

Though they feel abandoned, and there hasn't been much recent activity around them. Microsoft probably has all speech engineers working on Cortana instead. (Though I'd be surprised if she's not using .NET at some level.)

> Microsoft probably has all speech engineers working on Cortana instead

Microsoft cognitive services

The closest I've gotten to something useful is NLTK. It's really great and really powerful and there is plenty of documentation and how-to guides.
If you are playing in the SPARQL space take a look at http://quepy.machinalis.com/

But NLP based question answering is an unsolved problem and the best way to approach it is ensemble approaches.