Hacker News new | ask | show | jobs
by spidersouris 638 days ago
>I'm frankly tired of this problem being a thing. I'm working on a dataset full of entries for every letter in every word of /usr/share/dict/words so that it can be added to the training sets of language models.

Interesting, but it should take a while to generate the data, no? Will zero answers be part of the dataset as well?