Hacker News new | ask | show | jobs
by bayindirh 1898 days ago
If you're compressing Turkish, you can use syllables. Turkish has a nice feature of deterministic syllable division, which can be done via a neat state machine.

I've implemented it once, however I failed to encode it very efficiently on the disk. Nevertheless, it's promising. Here's what I've done:

https://www.ccis2k.org/iajit/PDF/vol.8,no.1/12.pdf