Hacker News new | ask | show | jobs
by mkl 2723 days ago
Do you know if any of these can be used for text prediction? (I.e. guessing what the next word/token will be.)
2 comments

Text prediction is usually called "language modeling" in NLP. Because it's useful as a weak supervision signal to improve performance on other tasks, most of the mentioned libraries support it. However, they might not always provide complete examples, instead assuming that you know how to express the model and train it using the primitives provided by the library.

Flair: https://github.com/zalandoresearch/flair/blob/master/flair/m...

Allen NLP: https://github.com/allenai/allennlp/blob/master/allennlp/dat...

PyText: https://github.com/facebookresearch/pytext/blob/master/pytex...

spaCy seems to focus on language analysis and I couldn't find an API that'd be directly usable for text generation.

Flair looks really promising to me!
Markov chains can be used to do type ahead prediction. It's likely what the iOS uses for their predictive keyboard.

https://en.wikipedia.org/wiki/Markov_chain

Yes, there are plenty of methods, and I have a couple implemented, but an off-the-shelf one from a cutting edge library would likely be better.
It's gonna be hard to get an "off the shelf" model for text prediction, because the upcoming text depends on the author, topic, and other context. You can probably find some decent pre-trained models to get started, but you'll need to customize them for your application to get good results.
Right, I was thinking off-the-shelf in the sense of giving it a tokenised corpus and it does the rest, or it incorporates that into its existing model. Dictation software, phone keyboards, etc. do this.