Hacker News new | ask | show | jobs
by moconnor 4331 days ago
The English text is 97 ASCII encoded bytes.

Compressed with zlib: 86 bytes.

Compressed with lzma: 98 bytes.

The Ithkuil representation is just 30 UTF-8 encoded bytes.

Compressed with zlib: 39 bytes.

Compressed with lzma: 47 bytes.

(Measured using python's zlib/pylmza modules to avoid e.g. file header overhead)

It's hard to achieve this kind of compression without an external dictionary. What Quijada has created with Ithkuil is, in part, a dictionary for the space of human thought and concepts, something I wouldn't have expected to work in the way the article describes it.

1 comments

Actually, using zlib format gets you an unnecessary 2 byte header and 4 byte footer, so the proper sizes are 80 and 33.

I'm having trouble figuring out what's going on with lzma because the spec is lying about the header, so I won't attempt to guess the correct number there.