Hacker News new | ask | show | jobs
by cleansy 3238 days ago
I was playing with this around for a weekend or two. So my knowledge is not exhaustive on that matter but it all boiled down to having a good OLAP-ish data source in the first place.

- You can do the Named Entity Tagging based on the categorical data (e.g. columns that are Text/Strings with low-ish relative cardinality would make good candidates to filter out text fields with for example email addresses (which shouldn't be in a DWH in the first place as categoricals))

- FLOATs/decimals/Integers would be good candidates for values that somebody looks for (and the name of the column would be the 'trigger' of the query.

All in all, with a bit of logic, good OLAP design and a lot of up front configuration I got in a weekends time to answer basic questions like 'revenue in the US in 2016' using NLTK back in the day. Today I would probably give spaCy a try as NLP engine.