Hacker News new | ask | show | jobs
by yongjik 4331 days ago
That was an interesting read, but the reporter's breathless assertions frequently got in the way of appreciating Quijada and his idea.

I mean, things like:

> A sentence like “On the contrary, I think it may turn out that this rugged mountain range trails off at some point” becomes simply “Tram-mļöi hhâsmařpţuktôx.”

Simply?

We could have used LZW algorithm and the sentence could probably become even shorter, just a "simple" sequence of random-ish bytes. If you increase the number of allowed symbols, of course you need less symbols to convey the same information. If you allow for a limitless set of words that are dynamically generated from combining many roots, of course the number of words decreases... sometimes down to 1, as in polysynthetic languages. This is Information Theory 101.

1 comments

The English text is 97 ASCII encoded bytes.

Compressed with zlib: 86 bytes.

Compressed with lzma: 98 bytes.

The Ithkuil representation is just 30 UTF-8 encoded bytes.

Compressed with zlib: 39 bytes.

Compressed with lzma: 47 bytes.

(Measured using python's zlib/pylmza modules to avoid e.g. file header overhead)

It's hard to achieve this kind of compression without an external dictionary. What Quijada has created with Ithkuil is, in part, a dictionary for the space of human thought and concepts, something I wouldn't have expected to work in the way the article describes it.

Actually, using zlib format gets you an unnecessary 2 byte header and 4 byte footer, so the proper sizes are 80 and 33.

I'm having trouble figuring out what's going on with lzma because the spec is lying about the header, so I won't attempt to guess the correct number there.