Hacker News new | ask | show | jobs
by meiji163 1894 days ago
My favorite is Context Tree Weighting (CTW) (https://ieeexplore.ieee.org/document/382012). Mathematically it's much nicer than PPM. Modern implementations of CTW achieve ratios close to or better than PPM in most domains (see e.g. P. Volf's Thesis "Weighting techniques in data compression")
1 comments

If you're compressing Turkish, you can use syllables. Turkish has a nice feature of deterministic syllable division, which can be done via a neat state machine.

I've implemented it once, however I failed to encode it very efficiently on the disk. Nevertheless, it's promising. Here's what I've done:

https://www.ccis2k.org/iajit/PDF/vol.8,no.1/12.pdf