Hacker News new | ask | show | jobs
by wooorm 4272 days ago
No full-frequency data is kept, only 300 top-trigrams are identified. A quick through the source also reveals wooorm/trigrams, and wooorm/udhr, as sources!
1 comments

yes, I meant: keeping full frequency could have been avoided to save space/memory but having two classes high/low could be a good tradeoff.
It’s an interesting thought. I might fiddle on it, but I’m not sure it would work in practice (d’oh). Thanks!