Hacker News new | ask | show | jobs
by tostitos1979 3238 days ago
Is there an open source NLP engine out there? I've been trying to learn this area and there are so many "pot holes" and wrong paths ... I've looked at OWL/Sparql, Graph DBs, logic programming, rule based systems. I feel like I'm dancing around the real topic and I don't know what "it" is :'(
5 comments

I was playing with this around for a weekend or two. So my knowledge is not exhaustive on that matter but it all boiled down to having a good OLAP-ish data source in the first place.

- You can do the Named Entity Tagging based on the categorical data (e.g. columns that are Text/Strings with low-ish relative cardinality would make good candidates to filter out text fields with for example email addresses (which shouldn't be in a DWH in the first place as categoricals))

- FLOATs/decimals/Integers would be good candidates for values that somebody looks for (and the name of the column would be the 'trigger' of the query.

All in all, with a bit of logic, good OLAP design and a lot of up front configuration I got in a weekends time to answer basic questions like 'revenue in the US in 2016' using NLTK back in the day. Today I would probably give spaCy a try as NLP engine.

NLTK in Python - it even has a basic example showing natural language to SQL translation that is pretty cool. Simplistic, but a good starting point for learning!

http://www.nltk.org/book/ch10.html

.NET has some speech synthesis and listener libraries. They work pretty well; I built a modestly-functional chat bot once with them. Not sure about the overall .NET licensing arrangement, but I heard it was moving towards open source.

Though they feel abandoned, and there hasn't been much recent activity around them. Microsoft probably has all speech engineers working on Cortana instead. (Though I'd be surprised if she's not using .NET at some level.)

> Microsoft probably has all speech engineers working on Cortana instead

Microsoft cognitive services

The closest I've gotten to something useful is NLTK. It's really great and really powerful and there is plenty of documentation and how-to guides.
If you are playing in the SPARQL space take a look at http://quepy.machinalis.com/

But NLP based question answering is an unsolved problem and the best way to approach it is ensemble approaches.