Hacker News new | ask | show | jobs
by gsnedders 4681 days ago
I can't find the research hsivonen (probably best known for validator.nu and Gecko's HTML parser) did into this a while ago, but he concluded that for the top 100 (IIRC) CJK sites, they were smaller as gzip'd UTF-8 than gzip'd UTF-16, because they all contained a sufficient quantity of HTML/CSS/JS that the gain from the shorter representation of them outweighed that of the text itself.