Hacker News new | ask | show | jobs
by nl 3692 days ago
The state of NLP tools generally is much lower than most people think. People think it is much easier than it is.

For the date parser you want http://nlp.stanford.edu/software/sutime.html

The code and rules aren't fun to customize though.

1 comments

Yeah, I looked at SuTime, but it fell down on many common cases (the CoreNLP online demo is actually integrating SuTime into the annotations it produces).

Another option is Natty [1], but it also seems to fail on the same examples. Natty at least has an ANTLR grammar that's reasonably easy to understand, though.

[1] http://natty.joestelmach.com/

I know of one large group that switched (from Timen[1]) to Heideltime[2] because of multi-language support.

One day someone will build a neural net model to do this rather than hand written rules.

[1] https://github.com/leondz/timen

[2] https://github.com/HeidelTime/heideltime

Thanks 'nl, 'nostrademons and 'rcpt for the links! I've been using Chronicity[0] in my project, and I hand-hacked a Polish-to-English regexp "translator" to make it work with Polish language[1]. I'll be looking at the sources of the libraries you provided as well as papers they reference; maybe I'll manage to steal some code :).

[0] - https://github.com/chaitanyagupta/chronicity

[1] - it's surprising how easy is to get 80% there with hacks like these: https://github.com/TeMPOraL/alice/blob/master/language.lisp#...