Hacker News new | ask | show | jobs
by nostrademons 3692 days ago
Yeah, I looked at SuTime, but it fell down on many common cases (the CoreNLP online demo is actually integrating SuTime into the annotations it produces).

Another option is Natty [1], but it also seems to fail on the same examples. Natty at least has an ANTLR grammar that's reasonably easy to understand, though.

[1] http://natty.joestelmach.com/

1 comments

I know of one large group that switched (from Timen[1]) to Heideltime[2] because of multi-language support.

One day someone will build a neural net model to do this rather than hand written rules.

[1] https://github.com/leondz/timen

[2] https://github.com/HeidelTime/heideltime

Thanks 'nl, 'nostrademons and 'rcpt for the links! I've been using Chronicity[0] in my project, and I hand-hacked a Polish-to-English regexp "translator" to make it work with Polish language[1]. I'll be looking at the sources of the libraries you provided as well as papers they reference; maybe I'll manage to steal some code :).

[0] - https://github.com/chaitanyagupta/chronicity

[1] - it's surprising how easy is to get 80% there with hacks like these: https://github.com/TeMPOraL/alice/blob/master/language.lisp#...