Hacker News new | ask | show | jobs
by zzo38computer 2389 days ago
How well does that corpus compress with LZMA if using a Korean specific character code (such as EUC-KR)? And what about other combinations, with other character codings and other compression algorithms?
1 comments

EUC-KR doesn't improve much with LZMA (2% over UTF-16), but is better with gzip-9 (10% over UTF-16). I haven't studied this extensively, just did a few tests when waiting for it to download.